Skip to content
This repository has been archived by the owner on Jul 23, 2019. It is now read-only.

Latest commit

 

History

History
31 lines (16 loc) · 6.71 KB

2018_07_10.md

File metadata and controls

31 lines (16 loc) · 6.71 KB

Update for July 10, 2018

It's been a while since the last update, and I apologize for that. Our strategic direction has felt less clear to me over the past few weeks, and that lack of clarity combined with some difficulty in my personal life overcame my motivation to post for a while. I just wanted to turn inward and write code in relative isolation. Things are clearer and I'm feeling better, and I'd like to resume posting updates on a weekly basis and ask your forgiveness for the gap in communication.

The emergence of Eon

When we demonstrated Xray for GitHub leadership in May, there was definitely interest in Xray's potential as a high-performance collaborative text editor that runs on the desktop or in the browser, but there was way more excitement about CRDTs and their potential to impact version control. At first, this feedback caused some cognitive dissonance for me. After working so hard on Xray, it wasn't easy to hear that what I considered to be an implementation detail was the most exciting aspect of what we had built. But the more I thought about it, the more intrigued I became with the application of CRDTs to version control. The idea had been floating around in my mind since early in the development of Teletype, but now I felt encouraged to take the idea more seriously.

After a bit of indecision, we decided to dive in. We've now shifted our focus to a new project called Eon, which enables real-time, fine-grained version control. Long term, we see Eon and Xray as two components of the same overall project. Eon will be an editor-agnostic datastore for fine-grained edit history that enables real-time synchronization. It will be like Git, but it will persist and synchronize changes at the granularity of individual keystrokes. We envision Xray as Eon's native user interface and the best showcase of its capabilities. One example is the idea of "layers", which are like commits that can be freely edited at any time.

Git never would have taken off if it had been trapped inside a particular editor, and so if we really want to maximize the utility of what we're building, it makes sense to be editor-agnostic at the core. That's why we've decided to focus on delivering Eon as a standalone project. It may look like we have stopped working on Xray, but since Xray will ultimately build on top of Eon, the spirit of the overall project continues.

Since I was presenting Eon at Qcon NYC, we briefly decided to pull out Eon into a separate repository, but then we decided that this was actually a bad idea. For now, we will continue to develop Eon within the Xray mono-repo in order to keep the community and development focused in a single location.

Progress on Eon

Previously, Xray's allowed you to invite guests into your workspace, but it was a centralized design. The workspace host owned all the files and serialized all guest requests to manipulate the file system. If the host dropped offline, the collaboration was over. With Eon, we're shooting for full decentralization. Multiple people can maintain a first-class replica of a given repository, just like Git.

To achieve that, over the past few weeks, we've been working on replicating the contents of the file system in addition to individual buffers. That means that if one person moves a directory while a collaborator adds a file inside of it, both parties will eventually converge to the same view of the world. It's proven to be a surprisingly complex problem.

We maintain a CRDT that represents the state of all the files and directories within the repository, but the only cross-platform way to detect file system changes is to scan the underlying directory structure and compare it to our in-memory representation. So far, we've focused only on directories, and we're caching inodes so we can detect when a directory is moved. We have yet to deal with files, which add the possibility of multiple hard links to the same file, but we're planning for them in our design. We also still need to deal with the fact that the file system might change in the middle of a scan, which might cause us to encounter a file or directory multiple times.

Once we detect a local change, we update the local index and create an operation to broadcast to other replicas. We've settled on a design in which each file or directory is assigned a unique identifier and associated with one or more parent references, which describe where that file is located in the tree. Directories can only have one parent reference since they cannot be hard linked, but files can have multiple. Additionally, directories are associated with child references, each of which has a name and corresponds to a parent reference elsewhere in the tree.

Each parent and child reference is a simple CRDT called a last-writer wins register. If a file is moved, we update its parent reference. If the same file is moved concurrently on another replica, we break the tie in a consistent way such that the file ends up in the same location in all replicas. Similarly, if two child references with the same name are created concurrently within a directory, only one of them will win across all replicas.

Inspired by the Btrfs file system, we're storing the state of the file system in the same copy-on-write B-tree that we use to represent the contents of buffers. Our tree is implemented generically, enabling us to reuse the same code for different kinds of items. In the case of our file system representation, each item is a member of an enumeration, which allows us to store file metadata, parent references, and child references all within the same tree. Each parent and child reference is actually represented by multiple tree items that share a reference id. We enforce a total order between all items in the tree, honoring the leftmost item for any register as the current value of that register.

We've also enhanced Xray's original B-tree to allow nodes to be persisted in an external key-value store. This will allow us to maintain a history of how the file system has evolved, and we plan to allow interactions with our tree to filter out certain nodes based on a summary of their contents. This will enable us to avoid loading portions of the tree that contain items that aren't visible in a specific version of the tree, which will keep the memory footprint small for any single version while still allowing us to load past versions of the tree if desired.

In many ways arriving at our current approach was more challenging than coming up with the CRDT for text. We spent many days doing almost nothing but thinking and not writing much code, but now we're feeling pretty good about the design. It seems simple and almost obvious, which is probably a good sign that we're on the right track.