Application of CRDTs to Solid

happybeing · July 19, 2020, 10:15am

I don’t recall CRDTs (Conflict-free Replicated Data Types) being mentioned here before, and am interested to hear if people think they could be useful for Solid.

I wrote the following in response to @NoelDeMartin’s post raising an issue with performance when an LDP container holds hundreds of documents. But I think CRDTs merit a topic of their own.

An alternative which I think would be interesting to explore for Solid is local-first using CRDTs. It’s a bit like offline-first, but peer to peer rather than syncing with a central server, so each participant has a local copy of a CRDT type document which they can edit, and changes are automatically synced by exchanging edits between peers. By implementing documents as CRDTs the merging changes is completely automatic and guaranteed to produce the same result in every copy.

This tech was probably a bit late to be considered originally for Solid but is now coming of age and could be another alternative to the downsides of a server based model.

So it’s not a quick fix - more an opportunity to rethink - and probably not immediately helpful, sorry! I’ve been looking into it recently, so I could start a topic of anyone is interested, or provide links etc.

Maidsafe have begun to switch over to using CRDTs within SAFE network for individual data types, and I’m looking at using it for a self replicating filesystem that could be mounted using FUSE. These are not local-first applications at this time, but a way to ensure the network data types can be replicated for redundancy, updated arbitrarily, and remain conflict free. I am still thinking about local first though, because it’s a promising way to implement collaborative applications in its own right, and could well solve the kind of problems we are seeing with Solid because of its server based model. Lots of issues just disappear when you eliminate servers.

For anyone who wants to follow up on CRDTs, here are a few links.

CRDT.tech

CRDTs: The Hard Parts — Martin Kleppmann’s talks (link). See this post for notes about this talk with time stamps.
A highly-available move operation for replicated trees and distributed filesystems, Kleppmann et al, 2020 (move-op.pdf)
- Code data etc related to the move-op paper (trvedata/move-op)
Local-first software: You own your data, in spite of the cloud (link)
Rust implementation of automerge (automerge/automerge-rs)

I particularly recommend Martin Kleppmann’s video presentation (I made notes from it which give times for different aspects, key slides and so on here).

aschrijver · July 19, 2020, 11:42am

About a week ago there was a big HN thread about CRDTs The Hard Part: https://news.ycombinator.com/item?id=23802208

(And there are a number of other interesting HN discussions on the topic.)

Through that discussion I found the DAT-based p2p project Hypermerge and the PushPin sample app built on top of it (both from automerge too). Very cool.

Vincent · July 20, 2020, 10:11am

I haven’t looked into it too deeply, but as far as I can see it’s pretty much unavoidable if you want to support offline-first and/or collaborative editing with Solid. (Don’t know why you’d need it to be peer-to-peer though, and it wouldn’t really be Solid any more if you did. Which is fine, of course But Solid should be able to help keep resources available even if peers are offline.)

But yeah, I don’t see it being a quick fix either. I’d expect the first avenues to look into would be to start a vocabulary to represent the addition and removal of triples, and then to write some code to convert those into JS data structures that e.g. automerge can work with - that should get you a long way. Yet another item that’s somewhere on my personal todo list

happybeing · July 20, 2020, 10:42am

Thanks Ruben. I’m interested to hear views or ideas for how this can/can’t be used with Solid or to move it forward in some way. I have my own thoughts, but want to get wider inputs.

N.B. I found a link to a PDF of the paper you linked: https://hal.inria.fr/hal-00686484/document

Smag0 · July 20, 2020, 10:55am

It sounds me possible using @jeffz solid-rest https://www.npmjs.com/package/solid-rest to store data locally on the device before syncing with automerge to the Pod.

I will try what is possible with Vuex store system on https://github.com/scenaristeur/shighl-vuejs/projects/1#card-42534637

gsvarovsky · September 18, 2020, 12:22pm

(I’m working on a project for just this! which operates on triples natively, called m-ld. I’m going to post a message in the General Discussion about it, because I’d love some feedback.)

I agree that CRDTs are coming of age. Up to now, most collaborative editors have been implemented more-or-less from scratch, at great cost, and specialised for their document type. That’s at least partly because it’s really hard to maintain reasonable & intentional semantics when edits are concurrent.

The emergent CRDT technologies typically express multiple data structures – usually starting with an append-only log, then lists and sets and counters and so forth, each of which sustains its basic semantics (uniqueness for sets, etc.).

Most of them then wrap these up in a native API which is exposed to the application. These can be somewhat complicated by the need to surface conflicts, if the underlying CRDT is not able to resolve them automatically (ironically, since the ‘C’ is supposed to stand for ‘Conflict-free’).

Almost all of the libraries out there are on the Javascript platform. There are pure data structures (Automerge, Yjs, Gun), and local-first replicated databases (OrbitDB, DagDB, PouchDB). A recent good review article with lots more detail is here: https://www.kn8.lt/blog/building-privacy-focused-collaborative-software/

So I agree with @Vincent that an important thing to do is to relate the syntax, and also the semantics, of the JS data structures, to RDF.

Personally I’ve gone down a different road, which is to start with RDF, and then expose what I hope is a usable API on top of it with JSON-LD, trying to make things easy for non-RDF apps.

I think an advantage of this approach is what @RubenVerborgh mentioned: RDF can carry its semantics with it. One thing I’m exploring is how to ensure that the semantics are enforced (especially in closed-world models) during concurrent edits. At the moment I’m working on specific common cases. Doing it in the general case is going to take some research! Gonna need help…

gsvarovsky · September 18, 2020, 1:28pm

Thanks! I’m also in contact with Dr. Luis-Daniel Ibáñez in Southampton who is local to me, and I believe worked with Pascal Molli before.

markjspivey · September 28, 2020, 8:28pm

saw this today:

aschrijver · September 29, 2020, 6:06am

Yes, @markjspivey, top 1 on Hacker News for a long time. I’ll also link the comment thread: https://news.ycombinator.com/item?id=24617542

And here’s the comment thread to Martin Kleppmann’s video (also posted above): https://news.ycombinator.com/item?id=23802208

gsvarovsky · October 6, 2020, 7:43am

Also to be found in this emerging world of live synchronisation:

Braid: Synchronization for HTTP

Braid is an extension to HTTP that generalizes it from a state transfer to a state synchronization protocol.

I think there’s a synergy between Braid and the Linked Data Platform. It could lead to a standard for ‘Synchronizers’ (Braid-speak for data sharing libraries like automerge and m-ld) to interoperate with Solid. On the other hand, it may be too soon to be thinking about standardising.

aschrijver · October 6, 2020, 8:38am

Thanks @gsvarovsky! Great resource.
I cross-posted to SocialHub in CRDT’s anyone? and to @pukkamustard’s openEngiadina Matrix chatroom.

pukkamustard · October 19, 2020, 6:33am

Hi! We have been researching CRDTs as underlying data structure for an ActivityPub server in the context of the openEngiadina project.

We have an initial write-up of the ideas: Distributed Mutable Containers.

This should also be very applicable to a service implementing the Linked Data Platform.

There are many implementation details missing (e.g. “How to map ActivityPub to these containers?”). We intend to implement the containers in an ActivityPub server to gain further insight.

Very happy for comments and feedback.

aschrijver · October 26, 2021, 5:09pm

FYI A list of CRDT-related resources I found today (via Hacker News):

https://wiki.nikitavoloboev.xyz/distributed-systems/crdt

(Btw, great personal knowledge base this fellow has)

And a recent video held at StrangeLoop (found via openEngiadina matrix chatroom):

aschrijver · December 12, 2021, 7:07am

Just posted this on fedi to @rosano … this great video by Martin Kleppmann of Ink & Switch about the research and development of Automerge, which applies all kinds of CRDT-based tech is a recommended watch:

(Cross-posted to SocialHub community in this comment)

Topic		Replies	Views
Request for Comments: CRDTish approach to Solid Build a Solid App	29	3517	September 9, 2023
Live shared Pod Data – making it happen Connection to Other Initiatives	10	2286	June 21, 2021
Want to work with RDF graphs in real-time? Build a Solid App	10	503	November 27, 2023
Potential algorithm to avoid conflicting file writes	9	910	September 23, 2021
Conflict Resolution	3	371	April 12, 2023

Application of CRDTs to Solid

Braid: Synchronization for HTTP

Related topics