I’m a beginner, so might just be missing something, but it seems to me that by being CRDTish it ends up being primarily a list of operations, without any guarantees to avoid corruption from concurrent edits (from changes on multiple devices) or different orders of those concurrent edits?
I’m still struggling to get my head around CRDTs (and the difference with operational transforms) but based on this example (https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.5670) I would add insert/delete counters on the main data model, an append-only operation broadcast log in a separate document, and then use reconstruction of the document to detect untracked changes to the main data model?
Really great to see you working on this - looking forward to following how you tackle it.
I’m not much beyond a beginner either, I started learning about this some weeks ago, so I may be missing something as well . But here’s how I understand it.
I say it is CRDTish because the Solid POD is not a CRDT node, it’s only a “dumb store”. But this same architecture could potentially be used for nodes communicating among themselves, and that would be a proper CRDT. Although that’s not a use-case I’m considering at the moment.
There cannot be any concurrent edits, because even if two operations happen at the same time, Hybrid Logical Clocks take care of making each event unique and sorted chronogically. So the latest operation would win. I’m still a bit fuzzy about the clocks, but worse case scenario I will just use normal timestamps. That’s normally not advisable for real-time collaboration, because you can’t trust the local timestamp of different devices in a distributed system. But for my use-case, I think it’s acceptable.
As I understand it, the main difference is that in Operational Transformations the operations can be transformed after they have been created, usually by a centralized server. With CRDTs, the operations are immutable and there is eventual consistency (a node with the same operations will have the same end state).
I’m tracking everything in the same document because it’s easier, but technically speaking it doesn’t matter in which document the state/operations are stored. They are still linked with urls through semantic properties (dc:subject and soukai:history in my example). The operations are effectively an append-only log, and the resource (:it in my example) is the reconstruction of the state through operations, but I need to store it so that other applications understand the data without looking at the operations.
About untracked changes, that’s why I’m using a checksum to see if the current state is the result of all the operations or that something else changed it.
With a LWW approach with an appropriate timestamp, writing to the document seems to
be the main spot where concurrency could corrupt the data. In that context, I was actually suggesting using webacls to make the document append-only and guarantee that deletions are not possible and the message list is indeed grow-only. Edits to soukai:history should only ever be inserts, and deletions would add tombstones for fields in new operations? Maybe you’re already doing this.
I suppose part of the attraction of formally adopting a CRDT (i.e. ensuring that the implementation meets required conditions) is that it then provides guarantees. I’m not sure how I’d approach testing of concurrent edits otherwise, given the edge cases.
I’m quite curious to see the performance of this approach and how it scales. It seems like storing and loading operation logs for each recipe will run into similar problems as with media kraken, and continue stress testing solid server performance…
Looking forward to the next update on your RSS feed
Ok, that’s cool. So maybe it is a real CRDT after all :D.
I heard him talking about the server having a node, but I didn’t realize it was only a message buffer (or I forgot about it xD). That annotated repo looks useful, I’ll take a look.
Yes, this is already taken care of in Soukai, in theory. Every time Soukai makes an update, it deletes previous properties before adding the new ones, and if it tries to delete a property that doesn’t exist (meaning, that it was changed by someone else) it’ll throw an error.
Now, I say “in theory” because I haven’t really tested this too much, and ideally I’d like to use Etags with the If-Match header instead.
About tombstones, something I don’t like about CRDTs is that data is kept forever, so if I delete a resource I’ll delete it for real. I’ll see how to handle that in the UI if there’s any conflicts.
Yes I looked at automerge, but I think it’s too complex for my use-case (I’m already second-guessing if I should be using CRDTs at all). And I think it’d be a lot harder to implement that in Solid (I’d need to store the automerge metadata on the POD). At least for the first version, I don’t think I’ll go beyond a LWW map.
Since I’ll have the entire history, I may add something in the UI to see the history so that information is not lost and users can “fix the merge” manually. But to be honest, I’m not sure if I’ll do even that in the first version.
Indeed. One of the biggest problems in Media Kraken, for me, is the initial loading that takes ages (on mobile). Following this new approach, that initial loading will still take place, but It’ll happen on the background so I’ll be able to use the application instantly. This is actually my biggest motivation for going offline-first, I don’t really experience connectivity issues. But I think it’s cool to take it all the way to offline-first :).
In case you’re curious, yesterday I published a video with a proof of concept. Under the hood, that’s already using a Solid POD :). The code is not published anywhere because I hard-coded a lot of things, but it’s cool to see that it works!
Great to see you’re looking into this. I haven’t prototyped anything yet, but I did think about it to some extent. The main conclusion I reached relates to this:
If you want such an approach, you’re going to have to go all in. That is, interoperability is only possible with other apps that also only store commands. You can periodically store snapshots when running into performance issues, but the source of truth is the command history, and any modifications done to the snapshot can and will be discarded unless they’re stored in the command log as well.
One other thing that’s interesting is that Resources could suffice with just Append (i.e. not Write) access.
The downside, of course, is that deleted data is always retrievable, and there’s significantly added complexity and potential failure modes.
Other terms that you might be interested in researching are event sourcing / command query responsibility segregation (CQRS).
Very interested in these myself. In combination with Domain-Driven Design (DDD), which maps well to closed Linked Data vocabularies (acting as bounded contexts to model a particular business domain), and is a very good way to take non-technical folks along a software-design process, up to testable and very modular, maintainable codebases.
(Note, the Event Sourcing is optional. You see it used in many examples, but it adds a lot of complexity in form of eventual consistency issues and code that is harder to test. You can always start with CQRS and extend to ES later on)
In theory I agree with this, and I wish it were possible. Maybe if CRDTs become more popular, and a vocabulary for CRDTs becomes as common as schema.org is today, that could be an option.
But in practice, in the state we are in today, I don’t think that’s feasible. I could do it, sure, but it’d be synonymous with my app not being interoperable. Also, I don’t think following this approach is doing it half-way, if anything I’m making it backwards compatible. An app aware of the operations would behave as expected. And what I mention of amending the history by adding a new operation with the diff doesn’t make it wrong.
Having said that, I haven’t looked into this a lot and this may come back to bite me in the future. But I think interoperability is one of the most important aspects that differenciates Solid from other technologies, and if users don’t start experiencing it, Solid won’t be any different than any other solutions.
One of the reasons why I’m so adamant about this is that when @aveltens used Ramen, he told me that he was already using schema:Recipe for recipes in his POD, so that was a great experience for him (or something like that, correct me if I’m wrong xD).
I want to see more of that :).
I knew about event sourcing and to be honest, I think it’s almost the same as what I’m doing (or maybe I don’t know it well enough to tell the difference). CQRS is one of those buzz words I’ve heard multiple times but I don’t really know what it means, I’ll check it out.
I make an edit offline that I cannot immediately sync
On another device, I decide to delete the item instead
In order for the deletion operation to win over the edit operation, or to detect the conflict, I think the deletion operation does need to be stored even if the properties are deleted for real?
I suppose the only “corruption” then is that a copy of (part of) the item still exists in the edit operation unless you have a garbage collection process?
I understand there are other CRDTs that handle deletions more elegantly but need to do some more reading…
I still have to think about this, but my current idea is that I’ll just show a message “this recipe was deleted, but you’ve done changes to it, what do you want to do? DELETE IT | RESTORE IT”.
Which again, doesn’t make it a CRDT because that is a conflict :). But I think it’s important to delete data for real, if you want to give people control over their data. As I said, I’d also like to eventually add a way to squash the history, with a similar UX (having to resolve conflicts manually).
Yes, I would always expect of a Solid App to find and use my existing data as far at possible. Of course there can be differences in the amount of data used/undestood. Some apps might use more or fewer terms then others, but this should not prevent them from using as much they can understand and conform to existing conventions (like storing my recipies in a certain folder I already choose and not “invent” a new one)
So, there was something that was bothering me: that LWW is a state-based CRDT, not an operation-based one, yet James Long’s approach uses a message database that seems to list operations.
I think I’ve now got my head around it: basically with a state-based CRDT the messages could be deleted after the replicas have updated - unless one wants to store history, the messages are not actually a long term part of the CRDT. A state based CRDT just involves merging two states to create a new one CRDT Glossary • Conflict-free Replicated Data Types
So instead of treating the Solid pod as a message database, it should actually be possible to just make it a replica and the key issue is granularity of edits.
Both an etag and modified timestamps provide ordering of edits, so just checking those already provides a crude LWW register at the level of a document.
Implementation of the LWW register simply involves not replacing the document if our edit is older.
We would prefer to do this at the level of a triple or a a record, which then means we need a timestamp at that lower level for a LWW. However, the fact that we have the timestamp at the higher level could still provide a level of robustness to other applications that wouldn’t store the more granular timestamps.
In terms of implementation, the CRDT state merging logic either needs to be embedded in the sparql update query, or in the client app, which then pushes the updated document to the pod. The latter approach potentially seems easier if etags are available and there are not too frequent concurrent edits.
It seems that the existing RDF CRDT implementations don’t use LWW, so I’m still planning to do some more reading, but thought I’d share what I’ve learnt.
It’s been a while since I started this discussion, but I finally have something to share :).
I haven’t finished the app I am working on, but I think I am done with the data layer. As I mentioned at the beginning, I have implemented this in Soukai, so it should be easy to reuse for new features and other apps.
For anyone who’s still interested in this, I’ve decided to make an alpha release and I’d appreciate it if you give me some feedback. You can use it here: https://umai.noeldemartin.com
Here’s some things to keep in mind if you decide to check it out:
What I’m more interested in hearing about is what happens in the POD and the synchronization between devices (you can just open two browsers to test).
Keep in mind that this is still a work in progress, so expect bugs and rough edges.
I am aware that deleting recipes doesn’t work (they get “resurrected”).
The vocab is not published yet, but I do care about it so let me know if you think something can be improved.
The app is using a very aggressive polling (3 seconds), but this is only for testing purposes. When I release a production version, I probably won’t use polling at all.
I still have a lot of work to do with the UI, so please don’t pay attention to that.
If you want to give it a try, I’d recommend using Penny with the Community Server. I usually run npx community-solid-server -p 4000 and use http://localhost:4000 when asked for a login url.
I’ve tried it with CSS.
the container creation went fine with the given container-name.
I’ve added all asked fields and everything was stored except for the ‘ingredients’.
I could examine the file space-cakes$.tll as follows…
Hey, thanks for checking it out :). In the first screenshot, you wrote the ingredients in the “description” field, and they do appear in the turtle document. Was that the issue? Or did you add them as ingredients but they weren’t saved?
this testmethod was not verbose, I guess…
yeah I added them as ingredients, too and they were not saved.
when revisiting the appsite a second time, I guess it came out of the browser cache and they were there.
 so they must’ ve been saved somewhere
Nice work! I’ve just given it a preliminary try since I didn’t have much time, so first things I noticed:
When I hit logout it says I will lose local recipes but they’re still in my Pod - however, I could still see the recipe until I refreshed the page
If I add an ingredient in Penny, then go back to Umai, I don’t see that ingredient.
If I then go back to Umai and add an ingredient there, I don’t see that stored in my Pod either.
Ah, it looks like I got disconnected from the server somehow. After reconnecting, the Umai ingredient gets added, but the ingredient I added through Penny is now gone
Ah, but that in turn is because I didn’t add the operations to add it in Penny - gotcha.
So then I figured: what happens if I change the ingredient listed in the operation in Penny, then modify the recipe in Umai. Well: Umai then updates the recipe itself in the Pod to list the changed ingredient (good), but Umai doesn’t update its own rendering of the recipe and therefore still lists the old ingredient.
That is, until I log in in a new private window, where it does list the correct recipe.
Also, the URL when viewing my recipe is https://umai.noeldemartin.com/recipes/premade-soup, but I can’t visit that in a private window, connect my Pod again, and see that recipe - I first have to go back to the homepage, click that recipe again, and then I get back at that URL, this time showing the recipe.
I would’ve made that shorter and less rambling but I didn’t have the time - hope it’s still useful
Yeah the ingredients list has no order (I’ll probably sort them by quantities or something, I’m not sure yet). It seems like it’s still kind of flaky, I’ll have to test a lot before releasing the first production version. Thanks for trying it out!