Request for Comments: CRDTish approach to Solid

Great to see you’re looking into this. I haven’t prototyped anything yet, but I did think about it to some extent. The main conclusion I reached relates to this:

If you want such an approach, you’re going to have to go all in. That is, interoperability is only possible with other apps that also only store commands. You can periodically store snapshots when running into performance issues, but the source of truth is the command history, and any modifications done to the snapshot can and will be discarded unless they’re stored in the command log as well.

One other thing that’s interesting is that Resources could suffice with just Append (i.e. not Write) access.

The downside, of course, is that deleted data is always retrievable, and there’s significantly added complexity and potential failure modes.

Other terms that you might be interested in researching are event sourcing / command query responsibility segregation (CQRS).

In that thread @pukkamustard mentions Distributed Mutable Containers. Just wanted to add to that this DMC spec has been progressing a lot since then, and now lives at different locations:

Very interested in these myself. In combination with Domain-Driven Design (DDD), which maps well to closed Linked Data vocabularies (acting as bounded contexts to model a particular business domain), and is a very good way to take non-technical folks along a software-design process, up to testable and very modular, maintainable codebases.

(Note, the Event Sourcing is optional. You see it used in many examples, but it adds a lot of complexity in form of eventual consistency issues and code that is harder to test. You can always start with CQRS and extend to ES later on)

2 Likes

In theory I agree with this, and I wish it were possible. Maybe if CRDTs become more popular, and a vocabulary for CRDTs becomes as common as schema.org is today, that could be an option.

But in practice, in the state we are in today, I don’t think that’s feasible. I could do it, sure, but it’d be synonymous with my app not being interoperable. Also, I don’t think following this approach is doing it half-way, if anything I’m making it backwards compatible. An app aware of the operations would behave as expected. And what I mention of amending the history by adding a new operation with the diff doesn’t make it wrong.

Having said that, I haven’t looked into this a lot and this may come back to bite me in the future. But I think interoperability is one of the most important aspects that differenciates Solid from other technologies, and if users don’t start experiencing it, Solid won’t be any different than any other solutions.

One of the reasons why I’m so adamant about this is that when @aveltens used Ramen, he told me that he was already using schema:Recipe for recipes in his POD, so that was a great experience for him (or something like that, correct me if I’m wrong xD).

I want to see more of that :).

I knew about event sourcing and to be honest, I think it’s almost the same as what I’m doing (or maybe I don’t know it well enough to tell the difference). CQRS is one of those buzz words I’ve heard multiple times but I don’t really know what it means, I’ll check it out.

Thanks for the suggestions!

I suppose a possible test case is:

  • I make an edit offline that I cannot immediately sync
  • On another device, I decide to delete the item instead

In order for the deletion operation to win over the edit operation, or to detect the conflict, I think the deletion operation does need to be stored even if the properties are deleted for real?
I suppose the only “corruption” then is that a copy of (part of) the item still exists in the edit operation unless you have a garbage collection process?

I understand there are other CRDTs that handle deletions more elegantly but need to do some more reading…

I still have to think about this, but my current idea is that I’ll just show a message “this recipe was deleted, but you’ve done changes to it, what do you want to do? DELETE IT | RESTORE IT”.

Which again, doesn’t make it a CRDT because that is a conflict :). But I think it’s important to delete data for real, if you want to give people control over their data. As I said, I’d also like to eventually add a way to squash the history, with a similar UX (having to resolve conflicts manually).

Yes, I would always expect of a Solid App to find and use my existing data as far at possible. Of course there can be differences in the amount of data used/undestood. Some apps might use more or fewer terms then others, but this should not prevent them from using as much they can understand and conform to existing conventions (like storing my recipies in a certain folder I already choose and not “invent” a new one)

1 Like

So, there was something that was bothering me: that LWW is a state-based CRDT, not an operation-based one, yet James Long’s approach uses a message database that seems to list operations.

I think I’ve now got my head around it: basically with a state-based CRDT the messages could be deleted after the replicas have updated - unless one wants to store history, the messages are not actually a long term part of the CRDT. A state based CRDT just involves merging two states to create a new one CRDT Glossary • Conflict-free Replicated Data Types

So instead of treating the Solid pod as a message database, it should actually be possible to just make it a replica and the key issue is granularity of edits.

Both an etag and modified timestamps provide ordering of edits, so just checking those already provides a crude LWW register at the level of a document.
Implementation of the LWW register simply involves not replacing the document if our edit is older.

We would prefer to do this at the level of a triple or a a record, which then means we need a timestamp at that lower level for a LWW. However, the fact that we have the timestamp at the higher level could still provide a level of robustness to other applications that wouldn’t store the more granular timestamps.
In terms of implementation, the CRDT state merging logic either needs to be embedded in the sparql update query, or in the client app, which then pushes the updated document to the pod. The latter approach potentially seems easier if etags are available and there are not too frequent concurrent edits.

It seems that the existing RDF CRDT implementations don’t use LWW, so I’m still planning to do some more reading, but thought I’d share what I’ve learnt.

2 Likes

FYI. Featured on HN: Faster CRDTs: An Adventure in Optimization | Hacker News

2 Likes

Hi there!

It’s been a while since I started this discussion, but I finally have something to share :).

I haven’t finished the app I am working on, but I think I am done with the data layer. As I mentioned at the beginning, I have implemented this in Soukai, so it should be easy to reuse for new features and other apps.

For anyone who’s still interested in this, I’ve decided to make an alpha release and I’d appreciate it if you give me some feedback. You can use it here: https://umai.noeldemartin.com

Here’s some things to keep in mind if you decide to check it out:

  • What I’m more interested in hearing about is what happens in the POD and the synchronization between devices (you can just open two browsers to test).
  • Keep in mind that this is still a work in progress, so expect bugs and rough edges.
  • I am aware that deleting recipes doesn’t work (they get “resurrected”).
  • The vocab is not published yet, but I do care about it so let me know if you think something can be improved.
  • The app is using a very aggressive polling (3 seconds), but this is only for testing purposes. When I release a production version, I probably won’t use polling at all.
  • I still have a lot of work to do with the UI, so please don’t pay attention to that.

If you want to give it a try, I’d recommend using Penny with the Community Server. I usually run npx community-solid-server -p 4000 and use http://localhost:4000 when asked for a login url.

1 Like

hey Noel.

I’ve tried it with CSS.
the container creation went fine with the given container-name.
I’ve added all asked fields and everything was stored except for the ‘ingredients’.
I could examine the file space-cakes$.tll as follows…

<#it> a <https://schema.org/Recipe>;
    <https://schema.org/name> "space cakes";
    <https://schema.org/description> "butter\nsugar\nflour\ncacao\nweed".
<#it-metadata> a <https://soukai.noeldemartin.com/vocab/Metadata>;
    <https://soukai.noeldemartin.com/vocab/createdAt> "2021-10-05T18:36:27.909Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <https://soukai.noeldemartin.com/vocab/resource> <#it>;
    <https://soukai.noeldemartin.com/vocab/updatedAt> "2021-10-05T18:43:14.306Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>.
<#it-operation-ee280fae-706d-4ec0-a1a0-422607d4da2a> a <https://soukai.noeldemartin.com/vocab/Operation>;
    <https://soukai.noeldemartin.com/vocab/resource> <#it>;
    <https://soukai.noeldemartin.com/vocab/date> "2021-10-05T18:36:27.909Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <https://soukai.noeldemartin.com/vocab/property> <https://schema.org/name>;
    <https://soukai.noeldemartin.com/vocab/value> "space cakes".
<#it-operation-55432b74-b67d-471e-b391-44634dd2563b> a <https://soukai.noeldemartin.com/vocab/Operation>;
    <https://soukai.noeldemartin.com/vocab/resource> <#it>;
    <https://soukai.noeldemartin.com/vocab/date> "2021-10-05T18:43:10.459Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <https://soukai.noeldemartin.com/vocab/property> <https://schema.org/description>;
    <https://soukai.noeldemartin.com/vocab/value> "butter\nsugar\nflour\ncacao\nweed".
<#it-operation-6cec7deb-c10e-4ac5-a399-42a4db2e88e5> a <https://soukai.noeldemartin.com/vocab/Operation>;
    <https://soukai.noeldemartin.com/vocab/resource> <#it>;
    <https://soukai.noeldemartin.com/vocab/date> "2021-10-05T18:43:14.306Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <https://soukai.noeldemartin.com/vocab/property> <https://schema.org/recipeIngredient>;
    <https://soukai.noeldemartin.com/vocab/value> "butter", "flour", "sugar", "cacao", "weed";
    <https://soukai.noeldemartin.com/vocab/type> <https://soukai.noeldemartin.com/vocab/RemoveOperation>.
<#it-operation-df06b0a1-ace3-4dba-a2aa-4b2b29aa8179> a <https://soukai.noeldemartin.com/vocab/Operation>;
    <https://soukai.noeldemartin.com/vocab/resource> <#it>;
    <https://soukai.noeldemartin.com/vocab/date> "2021-10-05T18:43:14.306Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <https://soukai.noeldemartin.com/vocab/property> <https://schema.org/recipeInstructions>;
    <https://soukai.noeldemartin.com/vocab/value> <#3db73be1-12e0-4472-9d20-f0544d7a28af>, <#e40dab18-21cc-489b-9672-2f84cd1c035f>;
    <https://soukai.noeldemartin.com/vocab/type> <https://soukai.noeldemartin.com/vocab/RemoveOperation>.
1 Like

I re-edited it and the ingredients were stored, also…
Bildschirmfoto_2021-10-05_21-42-03

Hey, thanks for checking it out :). In the first screenshot, you wrote the ingredients in the “description” field, and they do appear in the turtle document. Was that the issue? Or did you add them as ingredients but they weren’t saved?

1 Like

this testmethod was not verbose, I guess…
yeah I added them as ingredients, too and they were not saved.
when revisiting the appsite a second time, I guess it came out of the browser cache and they were there.
[edit] so they must’ ve been saved somewhere

ahhh, as I can see from the order of the ingredients, they in fact were saved, only not displayed. in the description and in the ingredients list sugar and flour changed the order.

so it works fine.

Nice work! I’ve just given it a preliminary try since I didn’t have much time, so first things I noticed:

  • When I hit logout it says I will lose local recipes but they’re still in my Pod - however, I could still see the recipe until I refreshed the page :slight_smile:
  • If I add an ingredient in Penny, then go back to Umai, I don’t see that ingredient.
  • If I then go back to Umai and add an ingredient there, I don’t see that stored in my Pod either.
  • Ah, it looks like I got disconnected from the server somehow. After reconnecting, the Umai ingredient gets added, but the ingredient I added through Penny is now gone :frowning:
  • Ah, but that in turn is because I didn’t add the operations to add it in Penny - gotcha.
  • So then I figured: what happens if I change the ingredient listed in the operation in Penny, then modify the recipe in Umai. Well: Umai then updates the recipe itself in the Pod to list the changed ingredient (good), but Umai doesn’t update its own rendering of the recipe and therefore still lists the old ingredient.
  • That is, until I log in in a new private window, where it does list the correct recipe.
  • Also, the URL when viewing my recipe is https://umai.noeldemartin.com/recipes/premade-soup, but I can’t visit that in a private window, connect my Pod again, and see that recipe - I first have to go back to the homepage, click that recipe again, and then I get back at that URL, this time showing the recipe.

I would’ve made that shorter and less rambling but I didn’t have the time - hope it’s still useful :slight_smile:

Yeah the ingredients list has no order (I’ll probably sort them by quantities or something, I’m not sure yet). It seems like it’s still kind of flaky, I’ll have to test a lot before releasing the first production version. Thanks for trying it out!

Thanks for trying it out and all the feedback, it’s really useful :D.

I think these two are the same bug, the recipe page doesn’t update properly, I’ll look into that. If you log out from the home screen, recipes should disappear.

All of this is related with one of the concerns I had about this approach (the 2nd one in particular, which I called “interoperability”). I will probably tackle it before release, and I think it should be fairly straightforward (just adding new operations for inconsistencies between the operations and the resource). But I haven’t looked into it yet.

In a nutshell, there are local models (stored in IndexedDB) and remote models (stored in the Solid POD). And when either of them is updated, the operations are sent to the other one. But the model itself is only reconstructed from the operations, so if there is something that has been modified without operations, it may missbehave. It’s also possible that there are bugs :). But I’m not testing in scenarios where changes happen outside of my app, yet.

I think this might be relevant and complementary related work: we are slowly researching our way towards ways to supporting multiple views on top of 1 authoritative write source. This idea resembles the CRDT, CQRS and Event Sourcing ideas. All of them needs something like an append-only container like in the Linked Data Event Streams spec. So that’s what we tried to realize: an LDES in a container-resource structure in LDP.

We published an LDES in LDP NPM library that allows you to do CRUD operations, abstracting away an append-only log of versions of this resource. We could also look into using on top your vocabulary instead of having a version-based approach.
https://www.npmjs.com/package/@treecg/versionawareldesinldp

However, we saw some limitations and had to apply quite some work-around to make this work and we’re now going to look at whether adding features in the core Solid spec would help managing append-only logs with less hassle. The paper discussing the limitations can be found here: https://raw.githubusercontent.com/woutslabbinck/papers/main/2022/Linked_Data_Event_Streams_in_Solid_containers.pdf

3 Likes

Hi all! I’m from the Braid group (braid dot org), and was part of the “Faster CRDTs” project with josephg mentioned above.

I’m pleased to meet you guys! We work on making state synchronization interoperable, by connecting with other projects and generalizing our approaches into common protocols.

Noel showed me a demo of this system in the SolidOS meeting this morning. It’s sweet! I have a suggestion on the architecture.

I’m seeing the following stack of abstractions:

   State (of the recipe)            <-
   ------------------------------      \
   CRDT history + metastate             |
   ------------------------------     notifications
   RDF                                  ^
   ------------------------------      /
   HTTP (state transfer protocol)   --

There’s a deep mismatch here inherited from HTTP. HTTP is a state transfer protocol (consider that REST stands for REpresentational State Transfer), but we are trying to do state synchronization over it. So we end up expressing the CRDT history itself as state, on top of RDF, and then each client computes its current state from that RDF state. We have recursive state, with state computed from state, and state at both the top and bottom of the stack.

Moreover, since HTTP itself doesn’t provide any subscription notification system, we have to use out-of-band solid notifications, which tell the client to re-fetch the state using the HTTP state transfer protocol. This requires extra roundtrips, and ends up more convoluted and messy than if we solve state synchronization in HTTP directly.

If we add CRDT+Notifications into HTTP directly, we can simplify the architecture quite a bit:

   State of the recipe
   ------------------------
   RDF
   ------------------------
   HTTP with CRDT history + Notifications  (state sync protocol)

This reduces round trips, simplifies the stack, and also makes it more general, because the synchronization features (versioning, subscriptions, patches, and CRDT/OT semantics) can interoperate beyond Solid, and work for any content-type, not just RDF.

We’re extending HTTP in this way with the Braid-HTTP protocol. I think we could add some powerful features to Solid with Braid-HTTP, and do it in a general way that interoperates with other systems too. Solid is uniquely poised to benefit from this work, because it’s built on top of HTTP, and also seeks interoperable standardized solutions. I think our projects are very complementary.

2 Likes

Hey @toomim, thanks for attending yesterday’s meeting and sharing your knowledge with us.

Your proposal seems very interesting, and it’d be great to see some proof of concept or demo using both Solid and Braid to understand this better; although I think I already get the gist of what you’re proposing.

I want to clarify something though. I think we already touched on this yesterday, but for others who didn’t attend the meeting. In order to have this working, we need to have something installed in the server such that the state sync protocol can be run outside of RDF and the Solid Protocol, right? If that’s the case, I think that could be a problem. The promise of Solid is that Solid Apps can work against any Solid POD implementation. If you add any other requirements, it won’t be a universal Solid App.

There have been already a couple of threads in this forum discussing similar issues:

However, that doesn’t mean this isn’t worth exploring. Eventually, if this proves to be a better solution, it could be included in the Solid Protocol itself.

1 Like