Christopher Meiklejohn will be leading a tutorial at Code Mesh 2016. He is a distributed systems researcher.
PurelyFunctional.tv: Very briefly, what is your tutorial about?
Christopher Meiklejohn: My tutorial is about teaching why Conflict-Free Replicated Data Types (CRDTs) are valuable and how they can be used to build different types of systems that provide high-availability and fault-tolerance.
PF.tv: What do you hope people will take away from the talk?
CM: I’m hoping that people will take away two things:
- Understanding why concurrency is challenging and why CRDTs are one good solution for tackling problems such as concurrency anomalies.
- Learning about a new distributed database called Antidote, that provides causally consistent transactions built on top of CRDTs.
PF.tv: What is the current state of the art in CRDTs?
CM: There’s a significant amount of CRDT research happening currently, that’s coming out of our European funded research project, SyncFree.
Mainly, most of the work is around the following:
- Building programming abstractions around CRDTs for making them easier to build applications with.
- Building databases with highly-available transactions.
- Building more efficient representations for CRDTs with less metadata overhead.
We’ll be continuing our work for the next three years, starting in January 2017, with a new European funded research project called LightKone. LightKone’s goal is to help people program correct applications at the edge, with shared, mutable state, correctly, using the technology that we have developed in SyncFree.
PF.tv: Are CRDTs composable? For instance, CRDT sequences exist, as do CRDT counters. Can I build a CRDT sequence of counts?
CM: Some, but not all. For instance, there is a design for a CRDT dictionary that supports composition of other CRDTs, and a CRDT lexicographical pair that supports composition. The current sequence designs, of which there are a few, do not support composition.
Composition is challenging because it requires that the CRDT that is encapsulating the other objects share a common “causal context” with the objects it encapsulates. This is the primary reason that you can not perform arbitrary composition, because some of the CRDT designs do not have a similar structure for representing this information.
PF.tv: What is Antidote? How is it different from Riak?
CM: Antidote is a database that is based on the same infrastructure that Riak is: a distributed hash-table with consistent hashing and hash-space partitioning. It differs from Riak for several reasons:
- Antidote has been designed with CRDTs as fundamental data structures, and structures everything around that concept.
- Riak has some CRDTs, but Antidote has a different set of CRDTs with completely different implementations. Riak uses what we call state-based implementations, where the entire state of the CRDT is shipped for each operation that's performed in the cluster. Antidote uses what we call operation-based CRDTs, where only the operation is sent between the different replicas of an object when an operation is performed. Operation-based CRDTs require much less state to be transmitted, but at the cost of requiring a stronger system model.
- Riak is eventually consistent: Antidote is causally consistent, which is a stronger consistency criteria where it ensures that events related to other events are delivered in causal order.
- Antidote has causally consistent transactions: multi-key mergable transactions are provided for when users need to perform atomic actions. Riak currently does not support transactions.
PF.tv: What concepts do you recommend people be familiar with to maximize their experience with the talk?
PF.tv: What resources are available for people who want to study up before the talk?
CM: I'd recommend my blog, where I've given several outlines of my work and written a series of posts on the history of distributed programming.
PF.tv: Where can people follow you online?
PF.tv: Are there any projects you'd like people to be aware of? How can people help out?
CM: All of my research groups work on distributed computing is available on GitHub, open source and written in Erlang. We work very hard to break down each piece of work we do into small reusable components, so ideally some of this components could be used in your applications and daily work. That said, it's important to keep in mind that we write research code, and so not everything is industry grade and merely exists to prove a prototype that we may be working on.
PF.tv: Where do you see the state of distributed systems in 10 years?
CM: I’m hoping that in ten years that we have better languages and abstractions for building distributed applications. I feel most of the current designs of distributed systems focus on composing systems together around a shared consensus system, such as Zookeeper, where many of these applications may be able to operate just fine with weaker properties: weak ordering, eventual consistency, idempotent operations, etc. However, building correct systems, with weak ordering, is challenging and currently there are not any good abstractions for doing so: this is the challenge that needs to be addressed.
PF.tv: If we could fund a Manhattan Project-style program to develop one game changer for distributed systems, and you were the director of the program, what would you focus on?
CM: Higher-level languages for building distributed applications that could synthesize applications from specifications in this language that were optimized for the system it was being deployed to.