Yet another etherpad clone
When etherpad came out in 2008 it blew my mind. In a way it was one of the few successful attempts to bring the unix philosophy to the mainstream: It did one thing (collaborative plain text editing) and did it really well. And people who used it -- not just programmers -- quickly understood that they could use this simple tool as a building block and apply it to many different situations.
The concept was so successful that it was copied by many. Google actually bought etherpad and later integrated some of its features into google docs. Collaborative editing was also added to editors like CKEditor or CodeMirror.
Even though the technology seems to have matured, today I often find myself struggling to find a tool for collaborative editing. I am not exactly sure why that is, but if I had to guess it would be a combination of these reasons:
- Google docs fills the niche: There is still space for an independent tool, but that space is much smaller than it used to be.
- Bad migration: The deprecation of the original etherpad meant that many servers had to migrate to a different solution. These migrations had some issues and made some users wary, me included.
- Unreliable: Some servers lose connection a lot, resulting in a frustrating editing experience.
- Feature creep: Additional features like syntax highlighting and markdown preview theoretically take nothing away. But they make these tools feel less like building blocks and more like applications with narrow usecases.
- Too much choice: When there was just etherpad I used that. But now there are many options and I am not sure which one to pick.
Concerning the last point I tried to read up on the most important projects to see what each of them had to offer. But when I looked at their code to find the underlying concepts I instead found what seemed like an impenetrable mess of abstractions.
Whenever I fail to understand something, I like to built it from scratch. That way I can better understand the challenges and tradeoffs that lead to a design that would otherwise be unintelligible.
Long story short: I built my own plain-text pad.
The guiding principle had to be simplicity: I wanted to get 80% of functionality for 20% effort.
In terms of UI that meant having a single textarea. No advanced features like WYSIWYG editing or markdown preview, and also leave out some of the features that existed in the original etherpad, e.g. chat or authorship colors.
To take this further I was also willing to cut corners in terms of collaboration: My assumption was that concurrent edits in the same place are rare and that there is really no perfect way to handle them anyway.
I read the original etherpad design document as well as the tremendously helpful posts on collaborative editing in ProseMirror and CodeMirror. From what I gathered I had to take care of the following steps:
- Capture input as diffs
- Broadcast diffs to all peers
- Apply broadcasted diffs without losing local changes
- Keep cursor, selection, and scroll position
The interesting stuff happens in step (3). So this is what I spent most of my time on.
Most tools seem to use diffs of the form "insert string
a at position
n". This makes it difficult to reorder changes because the position changes all the time. So instead I used diffs of the form "replace string
a by string
b". This meant that changes were much more independent and could be applied in arbitrary order -- as long as these changes were in different places. Since my assumption was that conflicts were rare, I didn't think too much about these exceptions and used a dumb merge algorithm for them.
But unfortunately this approach was doomed: Once there was a conflict it only got worse and the documents diverged. I spend quite some time tweaking the merge algorithm but nothing helped. I had to find a way to make the documents converge. The simplest way to do that is to have a single source of truth. There were different options on the table:
- Implement a dedicated server
- Let peers elect a leader
- Blockchain: Every change references a previous change; The longest chain wins
None of these options really sounded appealing. But then I realized that I already had a single source of truth: The order in which the messages were broadcasted. As long as all peers interpreted the changes the same way they would always have the same state.
Now whenever changes arrive from the server the clients perform the following steps:
- roll back local changes
- apply remote changes
- re-apply local changes
There can still be conflicts in steps (2) and (3). But since all peers use the same mechanism they get the same result. I actually decided to simply drop a change on conflict. As this happens quickly, users can react and re-apply their change manually. The big lesson here is that convergence is more important than the quality of the merge.
I created bots for testing that helped me find some bugs. But since many of my simplifications relied on assumptions about real-world usage I had to test with real people.
When I was confident enough I asked some coworkers if we could use my pad for a real meeting. And what can I say. During the meeting I completely forgot that this was my new tool. It just worked.
But what about the bigger picture? Today collaborative editing is both everywhere and nowhere at the same time: The technology is mature but it is hard to find a reliable server. This project shows that you can build a working implementation with very little code. I hope that this can help tip the needle towards "everywhere".