Causal Trees
235 points by sno6 1 year ago | 29 comments- mweidner 1 year agoAnother name for Causal Trees is "RGA" (Replicated Growable Array). They are ~identical algorithms that were published concurrently. E.g., Automerge uses RGA (https://automerge.org/docs/documents/#lists).
- bytearray 1 year agoHow does the performance of Causal Trees compare to other CRDT implementations, especially in scenarios with a high frequency of concurrent updates? It seems like a promising approach for collaborative text apps, but I'm curious about its scalability and real-world performance.
- mweidner 1 year agoIn my experience, this depends a lot more on the implementation than the CRDT algorithm. If you implement Causal Trees directly (as a tree with one node per char), it will be tolerably fast but use a lot of memory + storage. If you instead group chars into "runs" of sequentially-inserted chars and only store one Causal Tree node per run, it should be quite efficient.
Yjs (a widely used text CRDT) describes these sort of opts here: https://blog.kevinjahns.de/are-crdts-suitable-for-shared-edi... For a different tree-based CRDT, I did a head-to-head comparison of implementations that use a node-per-char (Fugue Simple) vs runs (Fugue), with results in Section 5 of this paper: https://arxiv.org/abs/2305.00583
- zogrodea 1 year agoI don't have any experience with this, but the use of flat arrays (rather than unbalanced trees) in Yjs sped things up considerably according to this long and interesting blog post below.
https://josephg.com/blog/crdts-go-brrr/
"We can use a flat array to store everything, rather than an unbalanced tree. This makes everything smaller and faster for the computer to process."
- josephg 1 year agoThanks for linking my post. I really need to write a followup at some point - we’ve gotten another 2-10x speed up from when I ran those benchmarks, depending on how you measure it.
I still stand by what I wrote in that blog post. Using lists rather than trees is still a good approach. And it’s super simple to implement too. You can have a working crdt in a few dozen lines of code.
I’m happy to answer questions if anyone is curious about this stuff. Ive been working to implement and optimise CRDTs for years at this point. Increasingly I’m seeing it as a solved problem.
- practal 1 year agoI've read up on CRDTs over the last two months or so (and I've come across your very helpful posts as well, of course), because I am building a collaborative editor for Practal [0].
In particular, I've invented a new simple text format for this which I call Recursive teXt (RX) [1]. The idea is to just develop a CRDT for RX. RX is naturally structured as a tree, and it seems to make sense to model a document as an A of blocks, a block as an A of lines and blocks, and a line as an A of characters. Here "A of" stands for some sort of CRDT array based on inserting via predecessor (and successor?). Each A-object (document, block, line) is referenced by its own id and stored in a purely functional tree (similar to how Redux would do it [3], and I think Automerge does it similarly).
Would be great to get your opinion on this design choice, maybe you see some obvious (or not so obvious) problems with it. One problem seems to be one that Kleppmann points out in [2, end of section 4], when you press enter in the middle of a line, so that a line is split into two lines, you have to deal with that in a special way. Similarly with splitting/joining blocks.
[1] https://practal.com/recursivetext/
[2] https://martin.kleppmann.com/papers/list-move-papoc20.pdf
[3] https://redux.js.org/usage/structuring-reducers/normalizing-...
- mattarm 1 year agoDefinitely interested in how you achieved another 2-10x over the btree approach. I want surprised that btree was as effective as it was, but I’d be curious to know how you squeezed a bit more out of it.
- 1 year ago
- practal 1 year ago
- josephg 1 year ago
- mweidner 1 year ago
- jasonjmcghee 1 year agoThis is a really fun post. Really appreciate the time you put into it!
Quick note: on mobile, the text inputs for the clients is forcing all the text to be very tiny, and you need to manually zoom in to read it.
- alephnan 1 year ago> Don’t fret if you’re a fan of central authority though, Figma successfully uses CRDTs server-side to handle the collaborative aspects of their product, as well as Soundcloud and many others.
Why bother with CRDTs if you’re doing server-side synchronization?
MMORPGs can handle synchronizing thousands of users without problem.
- hnb2137 1 year agoCRDTs solve the problem of concurrent updates bij users.
- tomaskafka 1 year agoAnd offline edits from concurrent users
- OskarS 1 year agoI've always wondered if it's a good trick for horizontal scaling as well. Like, if you have one server serving 1,000,000 clients, using a CRDT you could trivially split that up into 10 servers serving 100,000 clients each, and then have the ten servers be peers to each other.
- OskarS 1 year ago
- tomaskafka 1 year ago
- hnb2137 1 year ago
- CodeGroyper 1 year agoI don't know what CRDT stands for, can anyone tell me?
- OskarS 1 year agoConflict-free Replicated Data Types. It's essentially a way to make Google Docs-style products which are resilient in the face of disconnects and reconnects and slow syncing. Think of it like Git, but all merge conflicts are resolved automatically and deterministically.
- 1 year ago
- OskarS 1 year ago
- andai 1 year agoWhat does SoundCloud need CRDTs for?
- refulgentis 1 year agoI grep'd Soundcloud, then clicked the <a href> wrapping it, it linked here: https://github.com/soundcloud/roshi
(tl;Dr: time-series event storage via a LWW-element-set)
- andai 1 year agoYeah I found this page and as a SoundCloud user I have no idea what this is about. So it's not some user facing collaborative music feature I didn't know about, it just gives them more consistent analytics?
- andai 1 year ago
- refulgentis 1 year ago
- euroderf 1 year agoif you like CRDTs, have a look at pijul version control.
- cma 1 year agoConflict-Free Replicated Data Type
- shoarek 1 year agoYou posted this 2 times in 3 hours. By the way, Your connection is not private.