Causal Trees

235 points by sno6 1 year ago | 29 comments
  • mweidner 1 year ago
    Another name for Causal Trees is "RGA" (Replicated Growable Array). They are ~identical algorithms that were published concurrently. E.g., Automerge uses RGA (https://automerge.org/docs/documents/#lists).
    • archagon 1 year ago
      That is true, but I think the CT paper frames the algorithm in a much clearer way than the RGA paper does. It’s a pleasure to read.
      • hinkley 1 year ago
        RGA is a terrible name. It sounds more like some sort of new alternative to B-Trees than a concurrent data structure.
    • bytearray 1 year ago
      How does the performance of Causal Trees compare to other CRDT implementations, especially in scenarios with a high frequency of concurrent updates? It seems like a promising approach for collaborative text apps, but I'm curious about its scalability and real-world performance.
      • mweidner 1 year ago
        In my experience, this depends a lot more on the implementation than the CRDT algorithm. If you implement Causal Trees directly (as a tree with one node per char), it will be tolerably fast but use a lot of memory + storage. If you instead group chars into "runs" of sequentially-inserted chars and only store one Causal Tree node per run, it should be quite efficient.

        Yjs (a widely used text CRDT) describes these sort of opts here: https://blog.kevinjahns.de/are-crdts-suitable-for-shared-edi... For a different tree-based CRDT, I did a head-to-head comparison of implementations that use a node-per-char (Fugue Simple) vs runs (Fugue), with results in Section 5 of this paper: https://arxiv.org/abs/2305.00583

        • zogrodea 1 year ago
          I don't have any experience with this, but the use of flat arrays (rather than unbalanced trees) in Yjs sped things up considerably according to this long and interesting blog post below.

          https://josephg.com/blog/crdts-go-brrr/

          "We can use a flat array to store everything, rather than an unbalanced tree. This makes everything smaller and faster for the computer to process."

          • josephg 1 year ago
            Thanks for linking my post. I really need to write a followup at some point - we’ve gotten another 2-10x speed up from when I ran those benchmarks, depending on how you measure it.

            I still stand by what I wrote in that blog post. Using lists rather than trees is still a good approach. And it’s super simple to implement too. You can have a working crdt in a few dozen lines of code.

            I’m happy to answer questions if anyone is curious about this stuff. Ive been working to implement and optimise CRDTs for years at this point. Increasingly I’m seeing it as a solved problem.

            • practal 1 year ago
              I've read up on CRDTs over the last two months or so (and I've come across your very helpful posts as well, of course), because I am building a collaborative editor for Practal [0].

              In particular, I've invented a new simple text format for this which I call Recursive teXt (RX) [1]. The idea is to just develop a CRDT for RX. RX is naturally structured as a tree, and it seems to make sense to model a document as an A of blocks, a block as an A of lines and blocks, and a line as an A of characters. Here "A of" stands for some sort of CRDT array based on inserting via predecessor (and successor?). Each A-object (document, block, line) is referenced by its own id and stored in a purely functional tree (similar to how Redux would do it [3], and I think Automerge does it similarly).

              Would be great to get your opinion on this design choice, maybe you see some obvious (or not so obvious) problems with it. One problem seems to be one that Kleppmann points out in [2, end of section 4], when you press enter in the middle of a line, so that a line is split into two lines, you have to deal with that in a special way. Similarly with splitting/joining blocks.

              [0] https://practal.com

              [1] https://practal.com/recursivetext/

              [2] https://martin.kleppmann.com/papers/list-move-papoc20.pdf

              [3] https://redux.js.org/usage/structuring-reducers/normalizing-...

              • mattarm 1 year ago
                Definitely interested in how you achieved another 2-10x over the btree approach. I want surprised that btree was as effective as it was, but I’d be curious to know how you squeezed a bit more out of it.
                • 1 year ago
            • jasonjmcghee 1 year ago
              This is a really fun post. Really appreciate the time you put into it!

              Quick note: on mobile, the text inputs for the clients is forcing all the text to be very tiny, and you need to manually zoom in to read it.

              • alephnan 1 year ago
                > Don’t fret if you’re a fan of central authority though, Figma successfully uses CRDTs server-side to handle the collaborative aspects of their product, as well as Soundcloud and many others.

                Why bother with CRDTs if you’re doing server-side synchronization?

                MMORPGs can handle synchronizing thousands of users without problem.

                • hnb2137 1 year ago
                  CRDTs solve the problem of concurrent updates bij users.
                  • tomaskafka 1 year ago
                    And offline edits from concurrent users
                    • OskarS 1 year ago
                      I've always wondered if it's a good trick for horizontal scaling as well. Like, if you have one server serving 1,000,000 clients, using a CRDT you could trivially split that up into 10 servers serving 100,000 clients each, and then have the ten servers be peers to each other.
                • CodeGroyper 1 year ago
                  I don't know what CRDT stands for, can anyone tell me?
                  • OskarS 1 year ago
                    Conflict-free Replicated Data Types. It's essentially a way to make Google Docs-style products which are resilient in the face of disconnects and reconnects and slow syncing. Think of it like Git, but all merge conflicts are resolved automatically and deterministically.
                    • 1 year ago
                    • andai 1 year ago
                      What does SoundCloud need CRDTs for?
                      • refulgentis 1 year ago
                        I grep'd Soundcloud, then clicked the <a href> wrapping it, it linked here: https://github.com/soundcloud/roshi

                        (tl;Dr: time-series event storage via a LWW-element-set)

                        • andai 1 year ago
                          Yeah I found this page and as a SoundCloud user I have no idea what this is about. So it's not some user facing collaborative music feature I didn't know about, it just gives them more consistent analytics?
                      • euroderf 1 year ago
                        if you like CRDTs, have a look at pijul version control.
                        • cma 1 year ago
                          Conflict-Free Replicated Data Type
                          • shoarek 1 year ago
                            You posted this 2 times in 3 hours. By the way, Your connection is not private.