Hi there!
I’m working on a new Solid app, and I’ve decided to follow an offline-first approach. I’ve been doing some reading, and something I’ve come across multiple times is CRDTs (I also read about it in this forum a while ago).
I’m building a recipes manager, intended for use by a single user at a time. So the only type of synchronization I care about is storing offline changes in multiple devices and synchronizing them when they are back online, not real-time collaboration.
Learning about CRDTs has given me some ideas on how I could do this in Solid. But I don’t think my approach could be considered a CRDT because the server won’t be running a CRDT node, It’ll just be a “dumb storage”. That’s why I’m calling it CRDTish, keep that in mind.
I wanted to share my solution here (which is a work in progress) in order to get some feedback.
My Solution
I am using Soukai, a library I built for working with Solid. You don’t need to know anything about Soukai, other than it uses the Active Record design pattern.
Internally, this library keeps track of the changes made to each model, and they are sent to the POD upon saving. I thought a good solution to this problem would be to send operations describing the updates together with the changes.
For example, if I make a new recipe called “Ramen”, and later on I change the name to “Jun’s Ramen”, this is the information I’d have stored in the POD:
Current State: { name: "Jun's Ramen" }
History:
[T0] { name: "Ramen" }
[T1] { name: "Jun's Ramen" }
(you can find the full example using Turtle at the end of this post)
The idea is to maintain the same format for the model data so that other applications continue understanding it, whilst adding some metadata that my app would use for CRDT merging.
In addition to the changes, operations would also store the time using Hybrid Logical Clocks (check the references at the end).
This metadata would also include a checksum of all the known operations, in order to avoid unnecessary processing for models that are already up to date. And a checksum of the model data, in order to detect changes made by other applications. In which case my app would create a new operation with the changes since the last known state.
Concerns
These are some concerns I have with my current solution:
-
Data overhead: If you look at the example that follows, you’ll notice that there is a lot of overhead. As far as I know this is a common issue with CRDTs, but keeping in mind my use-case I don’t think this will be a problem (and I could implement some algorithms to squash the history later on).
-
Interoperability: Other applications will understand the data, given that the main resource is still the same. But if they start modifying the data as well, some things could break down. I’ve already thought about it using the checksum and creating new operations in my app, but timestamps will be messed up and there could be other issues.
-
Custom Vocab: I haven’t found an ontology for this, given that it’s so custom. This isn’t such a big problem as I can create my own vocab, but I’m reticent to doing this because it’s likely that only my apps will understand it (or apps using Soukai).
-
Modeling Operations: Operation resources have both semantic properties (like the time, or
rdfs:type
) and the changes that happened to the model. I’m not sure this makes sense, because I am saying for example that a certain operation has properties of the model. Would it make sense to have yet another block of data, let’s call it changeset, that has only the model properties? without any other operation metadata. I’m also just using update operations in this example, but I will also need other operations like add/remove if I work with lists in the model (for example, ingredients). -
Complexity: I can’t help but wonder if I’m overthinking this. I just wanted to make an offline-first app and I ended up here, I’m not sure how much down the rabbit hole I should go. But this looks like something that could be useful in the future if I want to tackle more complex use-cases, so I’m exploring to see where this takes me.
Example
(Imagine that T0, T1 and T2 are Hybrid Logical Clock timestamps, or just timestamps)
In my app:
// at T0
const recipe = await Recipe.create({
name: 'Ramen',
description: 'Ramen is delicious',
});
// at T1
await recipe.update({ description: 'Ramen is life' });
// at T2
await recipe.update({
name: "Jun's Ramen",
description: 'Instructions: https://www.youtube.com/watch?v=9WXIrnWsaCo',
});
In the server, ramen.ttl
at T0:
@prefix : <#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix schema: <http://schema.org/> .
@prefix soukai: <https://vocab.soukai.js.org/> . # This doesn't exist yet!
:it
a schema:Recipe ;
schema:name "Ramen" ;
schema:description "Ramen is delicious" .
:it-metadata
a soukai:ModelMetadata ;
dc:subject :it ;
soukai:created "T0" ;
soukai:modified "T0" ;
soukai:modelChecksum "hash(:it properties)" ;
soukai:operationsChecksum "hash(T0)" ;
soukai:history :it-operation-0 .
:it-operation-0
a soukai:ModelOperation ;
soukai:time "T0" ;
schema:name "Ramen" ;
schema:description "Ramen is delicious" .
In the server, ramen.ttl
at T1:
@prefix : <#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix schema: <http://schema.org/> .
@prefix soukai: <https://vocab.soukai.js.org/> . # This doesn't exist yet!
:it
a schema:Recipe ;
schema:name "Ramen" ;
schema:description "Ramen is life" .
:it-metadata
a soukai:ModelMetadata ;
dc:subject :it ;
soukai:created "T0" ;
soukai:modified "T1" ;
soukai:modelChecksum "hash(:it properties)" ;
soukai:operationsChecksum "hash(T0+T1)" ;
soukai:history :it-operation-0, :it-operation-1 .
:it-operation-0
a soukai:ModelOperation ;
soukai:time "T0" ;
schema:name "Ramen" ;
schema:description "Ramen is delicious" .
:it-operation-1
a soukai:ModelOperation ;
soukai:time "T1" ;
schema:description "Ramen is life" .
In the server, ramen.ttl
at T2:
@prefix : <#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix schema: <http://schema.org/> .
@prefix soukai: <https://vocab.soukai.js.org/> . # This doesn't exist yet!
:it
a schema:Recipe ;
schema:name "Jun's Ramen" ;
schema:description "Instructions: https://www.youtube.com/watch?v=9WXIrnWsaCo" .
:it-metadata
a soukai:ModelMetadata ;
dc:subject :it ;
soukai:created "T0" ;
soukai:modified "T2" ;
soukai:modelChecksum "hash(:it properties)" ;
soukai:operationsChecksum "hash(T0+T1+T2)" ;
soukai:history :it-operation-0, :it-operation-1, :it-operation-2 .
:it-operation-0
a soukai:ModelOperation ;
soukai:time "T0" ;
schema:name "Ramen" ;
schema:description "Ramen is delicious" .
:it-operation-1
a soukai:ModelOperation ;
soukai:time "T1" ;
schema:description "Ramen is life" .
:it-operation-2
a soukai:ModelOperation ;
soukai:time "T2" ;
schema:name "Jun's Ramen" ;
schema:description "Instructions: https://www.youtube.com/watch?v=9WXIrnWsaCo" .
References
- Conflict Resolution for Eventual Consistency
- CRDTs for Mortals
- Local-first software
- Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases
- Talking with @gsvarovsky, author of m-ld
I’ve read/watched other resources, but these are the ones I found most useful. If there’s anything I’m missing that you think I should check out, let me know!
So, what do you think? Does it make sense? Am I missing something? Am I overengineering for my use case?
All feedback is welcome!