Request for Comments: CRDTish approach to Solid

I suppose a possible test case is:

  • I make an edit offline that I cannot immediately sync
  • On another device, I decide to delete the item instead

In order for the deletion operation to win over the edit operation, or to detect the conflict, I think the deletion operation does need to be stored even if the properties are deleted for real?
I suppose the only “corruption” then is that a copy of (part of) the item still exists in the edit operation unless you have a garbage collection process?

I understand there are other CRDTs that handle deletions more elegantly but need to do some more reading…

I still have to think about this, but my current idea is that I’ll just show a message “this recipe was deleted, but you’ve done changes to it, what do you want to do? DELETE IT | RESTORE IT”.

Which again, doesn’t make it a CRDT because that is a conflict :). But I think it’s important to delete data for real, if you want to give people control over their data. As I said, I’d also like to eventually add a way to squash the history, with a similar UX (having to resolve conflicts manually).

Yes, I would always expect of a Solid App to find and use my existing data as far at possible. Of course there can be differences in the amount of data used/undestood. Some apps might use more or fewer terms then others, but this should not prevent them from using as much they can understand and conform to existing conventions (like storing my recipies in a certain folder I already choose and not “invent” a new one)

1 Like

So, there was something that was bothering me: that LWW is a state-based CRDT, not an operation-based one, yet James Long’s approach uses a message database that seems to list operations.

I think I’ve now got my head around it: basically with a state-based CRDT the messages could be deleted after the replicas have updated - unless one wants to store history, the messages are not actually a long term part of the CRDT. A state based CRDT just involves merging two states to create a new one CRDT Glossary • Conflict-free Replicated Data Types

So instead of treating the Solid pod as a message database, it should actually be possible to just make it a replica and the key issue is granularity of edits.

Both an etag and modified timestamps provide ordering of edits, so just checking those already provides a crude LWW register at the level of a document.
Implementation of the LWW register simply involves not replacing the document if our edit is older.

We would prefer to do this at the level of a triple or a a record, which then means we need a timestamp at that lower level for a LWW. However, the fact that we have the timestamp at the higher level could still provide a level of robustness to other applications that wouldn’t store the more granular timestamps.
In terms of implementation, the CRDT state merging logic either needs to be embedded in the sparql update query, or in the client app, which then pushes the updated document to the pod. The latter approach potentially seems easier if etags are available and there are not too frequent concurrent edits.

It seems that the existing RDF CRDT implementations don’t use LWW, so I’m still planning to do some more reading, but thought I’d share what I’ve learnt.

2 Likes

FYI. Featured on HN: Faster CRDTs: An Adventure in Optimization | Hacker News

2 Likes

Hi there!

It’s been a while since I started this discussion, but I finally have something to share :).

I haven’t finished the app I am working on, but I think I am done with the data layer. As I mentioned at the beginning, I have implemented this in Soukai, so it should be easy to reuse for new features and other apps.

For anyone who’s still interested in this, I’ve decided to make an alpha release and I’d appreciate it if you give me some feedback. You can use it here: https://umai.noeldemartin.com

Here’s some things to keep in mind if you decide to check it out:

  • What I’m more interested in hearing about is what happens in the POD and the synchronization between devices (you can just open two browsers to test).
  • Keep in mind that this is still a work in progress, so expect bugs and rough edges.
  • I am aware that deleting recipes doesn’t work (they get “resurrected”).
  • The vocab is not published yet, but I do care about it so let me know if you think something can be improved.
  • The app is using a very aggressive polling (3 seconds), but this is only for testing purposes. When I release a production version, I probably won’t use polling at all.
  • I still have a lot of work to do with the UI, so please don’t pay attention to that.

If you want to give it a try, I’d recommend using Penny with the Community Server. I usually run npx community-solid-server -p 4000 and use http://localhost:4000 when asked for a login url.

1 Like

hey Noel.

I’ve tried it with CSS.
the container creation went fine with the given container-name.
I’ve added all asked fields and everything was stored except for the ‘ingredients’.
I could examine the file space-cakes$.tll as follows…

<#it> a <https://schema.org/Recipe>;
    <https://schema.org/name> "space cakes";
    <https://schema.org/description> "butter\nsugar\nflour\ncacao\nweed".
<#it-metadata> a <https://soukai.noeldemartin.com/vocab/Metadata>;
    <https://soukai.noeldemartin.com/vocab/createdAt> "2021-10-05T18:36:27.909Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <https://soukai.noeldemartin.com/vocab/resource> <#it>;
    <https://soukai.noeldemartin.com/vocab/updatedAt> "2021-10-05T18:43:14.306Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>.
<#it-operation-ee280fae-706d-4ec0-a1a0-422607d4da2a> a <https://soukai.noeldemartin.com/vocab/Operation>;
    <https://soukai.noeldemartin.com/vocab/resource> <#it>;
    <https://soukai.noeldemartin.com/vocab/date> "2021-10-05T18:36:27.909Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <https://soukai.noeldemartin.com/vocab/property> <https://schema.org/name>;
    <https://soukai.noeldemartin.com/vocab/value> "space cakes".
<#it-operation-55432b74-b67d-471e-b391-44634dd2563b> a <https://soukai.noeldemartin.com/vocab/Operation>;
    <https://soukai.noeldemartin.com/vocab/resource> <#it>;
    <https://soukai.noeldemartin.com/vocab/date> "2021-10-05T18:43:10.459Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <https://soukai.noeldemartin.com/vocab/property> <https://schema.org/description>;
    <https://soukai.noeldemartin.com/vocab/value> "butter\nsugar\nflour\ncacao\nweed".
<#it-operation-6cec7deb-c10e-4ac5-a399-42a4db2e88e5> a <https://soukai.noeldemartin.com/vocab/Operation>;
    <https://soukai.noeldemartin.com/vocab/resource> <#it>;
    <https://soukai.noeldemartin.com/vocab/date> "2021-10-05T18:43:14.306Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <https://soukai.noeldemartin.com/vocab/property> <https://schema.org/recipeIngredient>;
    <https://soukai.noeldemartin.com/vocab/value> "butter", "flour", "sugar", "cacao", "weed";
    <https://soukai.noeldemartin.com/vocab/type> <https://soukai.noeldemartin.com/vocab/RemoveOperation>.
<#it-operation-df06b0a1-ace3-4dba-a2aa-4b2b29aa8179> a <https://soukai.noeldemartin.com/vocab/Operation>;
    <https://soukai.noeldemartin.com/vocab/resource> <#it>;
    <https://soukai.noeldemartin.com/vocab/date> "2021-10-05T18:43:14.306Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <https://soukai.noeldemartin.com/vocab/property> <https://schema.org/recipeInstructions>;
    <https://soukai.noeldemartin.com/vocab/value> <#3db73be1-12e0-4472-9d20-f0544d7a28af>, <#e40dab18-21cc-489b-9672-2f84cd1c035f>;
    <https://soukai.noeldemartin.com/vocab/type> <https://soukai.noeldemartin.com/vocab/RemoveOperation>.
1 Like

I re-edited it and the ingredients were stored, also…
Bildschirmfoto_2021-10-05_21-42-03

Hey, thanks for checking it out :). In the first screenshot, you wrote the ingredients in the “description” field, and they do appear in the turtle document. Was that the issue? Or did you add them as ingredients but they weren’t saved?

1 Like

this testmethod was not verbose, I guess…
yeah I added them as ingredients, too and they were not saved.
when revisiting the appsite a second time, I guess it came out of the browser cache and they were there.
[edit] so they must’ ve been saved somewhere

ahhh, as I can see from the order of the ingredients, they in fact were saved, only not displayed. in the description and in the ingredients list sugar and flour changed the order.

so it works fine.

Nice work! I’ve just given it a preliminary try since I didn’t have much time, so first things I noticed:

  • When I hit logout it says I will lose local recipes but they’re still in my Pod - however, I could still see the recipe until I refreshed the page :slight_smile:
  • If I add an ingredient in Penny, then go back to Umai, I don’t see that ingredient.
  • If I then go back to Umai and add an ingredient there, I don’t see that stored in my Pod either.
  • Ah, it looks like I got disconnected from the server somehow. After reconnecting, the Umai ingredient gets added, but the ingredient I added through Penny is now gone :frowning:
  • Ah, but that in turn is because I didn’t add the operations to add it in Penny - gotcha.
  • So then I figured: what happens if I change the ingredient listed in the operation in Penny, then modify the recipe in Umai. Well: Umai then updates the recipe itself in the Pod to list the changed ingredient (good), but Umai doesn’t update its own rendering of the recipe and therefore still lists the old ingredient.
  • That is, until I log in in a new private window, where it does list the correct recipe.
  • Also, the URL when viewing my recipe is https://umai.noeldemartin.com/recipes/premade-soup, but I can’t visit that in a private window, connect my Pod again, and see that recipe - I first have to go back to the homepage, click that recipe again, and then I get back at that URL, this time showing the recipe.

I would’ve made that shorter and less rambling but I didn’t have the time - hope it’s still useful :slight_smile:

Yeah the ingredients list has no order (I’ll probably sort them by quantities or something, I’m not sure yet). It seems like it’s still kind of flaky, I’ll have to test a lot before releasing the first production version. Thanks for trying it out!

Thanks for trying it out and all the feedback, it’s really useful :D.

I think these two are the same bug, the recipe page doesn’t update properly, I’ll look into that. If you log out from the home screen, recipes should disappear.

All of this is related with one of the concerns I had about this approach (the 2nd one in particular, which I called “interoperability”). I will probably tackle it before release, and I think it should be fairly straightforward (just adding new operations for inconsistencies between the operations and the resource). But I haven’t looked into it yet.

In a nutshell, there are local models (stored in IndexedDB) and remote models (stored in the Solid POD). And when either of them is updated, the operations are sent to the other one. But the model itself is only reconstructed from the operations, so if there is something that has been modified without operations, it may missbehave. It’s also possible that there are bugs :). But I’m not testing in scenarios where changes happen outside of my app, yet.

I think this might be relevant and complementary related work: we are slowly researching our way towards ways to supporting multiple views on top of 1 authoritative write source. This idea resembles the CRDT, CQRS and Event Sourcing ideas. All of them needs something like an append-only container like in the Linked Data Event Streams spec. So that’s what we tried to realize: an LDES in a container-resource structure in LDP.

We published an LDES in LDP NPM library that allows you to do CRUD operations, abstracting away an append-only log of versions of this resource. We could also look into using on top your vocabulary instead of having a version-based approach.
https://www.npmjs.com/package/@treecg/versionawareldesinldp

However, we saw some limitations and had to apply quite some work-around to make this work and we’re now going to look at whether adding features in the core Solid spec would help managing append-only logs with less hassle. The paper discussing the limitations can be found here: https://raw.githubusercontent.com/woutslabbinck/papers/main/2022/Linked_Data_Event_Streams_in_Solid_containers.pdf

3 Likes

Hi all! I’m from the Braid group (braid dot org), and was part of the “Faster CRDTs” project with josephg mentioned above.

I’m pleased to meet you guys! We work on making state synchronization interoperable, by connecting with other projects and generalizing our approaches into common protocols.

Noel showed me a demo of this system in the SolidOS meeting this morning. It’s sweet! I have a suggestion on the architecture.

I’m seeing the following stack of abstractions:

   State (of the recipe)            <-
   ------------------------------      \
   CRDT history + metastate             |
   ------------------------------     notifications
   RDF                                  ^
   ------------------------------      /
   HTTP (state transfer protocol)   --

There’s a deep mismatch here inherited from HTTP. HTTP is a state transfer protocol (consider that REST stands for REpresentational State Transfer), but we are trying to do state synchronization over it. So we end up expressing the CRDT history itself as state, on top of RDF, and then each client computes its current state from that RDF state. We have recursive state, with state computed from state, and state at both the top and bottom of the stack.

Moreover, since HTTP itself doesn’t provide any subscription notification system, we have to use out-of-band solid notifications, which tell the client to re-fetch the state using the HTTP state transfer protocol. This requires extra roundtrips, and ends up more convoluted and messy than if we solve state synchronization in HTTP directly.

If we add CRDT+Notifications into HTTP directly, we can simplify the architecture quite a bit:

   State of the recipe
   ------------------------
   RDF
   ------------------------
   HTTP with CRDT history + Notifications  (state sync protocol)

This reduces round trips, simplifies the stack, and also makes it more general, because the synchronization features (versioning, subscriptions, patches, and CRDT/OT semantics) can interoperate beyond Solid, and work for any content-type, not just RDF.

We’re extending HTTP in this way with the Braid-HTTP protocol. I think we could add some powerful features to Solid with Braid-HTTP, and do it in a general way that interoperates with other systems too. Solid is uniquely poised to benefit from this work, because it’s built on top of HTTP, and also seeks interoperable standardized solutions. I think our projects are very complementary.

2 Likes

Hey @toomim, thanks for attending yesterday’s meeting and sharing your knowledge with us.

Your proposal seems very interesting, and it’d be great to see some proof of concept or demo using both Solid and Braid to understand this better; although I think I already get the gist of what you’re proposing.

I want to clarify something though. I think we already touched on this yesterday, but for others who didn’t attend the meeting. In order to have this working, we need to have something installed in the server such that the state sync protocol can be run outside of RDF and the Solid Protocol, right? If that’s the case, I think that could be a problem. The promise of Solid is that Solid Apps can work against any Solid POD implementation. If you add any other requirements, it won’t be a universal Solid App.

There have been already a couple of threads in this forum discussing similar issues:

However, that doesn’t mean this isn’t worth exploring. Eventually, if this proves to be a better solution, it could be included in the Solid Protocol itself.

1 Like

the state sync protocol [would] be run outside of RDF and the Solid Protocol, right?

No, I think we’d put it inside the HTTP part of the Solid Protocol, giving it the ability to communicate in terms of patches of RDF. We would extend the Solid protocol at the HTTP level.

You’re totally right that the next step would be to make a demo! We need a patch format for RDF. I’m not too familiar with RDF. What’s a good way to specify ranges of RDF data? Like, is there an equivalent of xpath or jsonpath for RDF? I want to describe an arbitrary part of RDF, within a resource at a URL, that has changed.

1 Like

Ok, well what I meant with “outside the Solid Protocol” was that it cannot be implemented just using the Solid Protocol (as it stands today). So we would indeed need to implement something in the server side.

I’m very focused on doing apps that can be used today by anyone who just has a Solid Pod, that’s why the solution I came up with is based on moving RDF triples from apps to a POD. But yeah, it would be great if the protocol itself handles this for us.

Although I will also say, there is always some tension on how complex Solid servers should be. Whenever any new feature is added to the protocol, one of the concerns is if it makes life harder for POD implementers. So depending how feasible your proposal is to implement, there could be some push back. Even if you provide a reference implementation, Solid Servers are implemented using a variety of languages.

Again, that doesn’t mean it’s not worth exploring. I think one way to get this moving is creating an issue in the solid/specification repository. If you browse the list of open issues, you’ll notice how many of them are ideas such as this. That way you could also get some feedback from people who are actually working on the spec (I only make apps, so I’m not too familiar with how the spec evolves).

Other than that, I think a demo would indeed go a long way.

I don’t know of any solution for that, but I’m sure someone in the community will know if such a thing exists. I suggest that you ask about it in the chat or opening a new thread in this forum. If you don’t get a reply, I can point you towards some people who I’m pretty sure will have an answer.

1 Like

Thanks a bunch @NoelDeMartin! I get where you’re coming from, and your approach makes total sense if you aren’t looking at modifying HTTP.

So I’m concluding that we can simplify things a lot by braidifying HTTP in Solid, and the next step is to make a demo.

Towards that end, we need a patch format, and Rahul just showed me the N3 Patch in Solid, which looks super viable, and also pointed out that we can issue JSON patches to any RDF expressed in JSON-LD. These sound like great approaches.

The next thing I’ll need is a demo app. Maybe I should take one of your apps and braidify it. I wonder if we have any simple Solid demo apps that let the user edit some data (maybe text) expressed as JSON-LD, with a server written in nodejs.