Potential algorithm to avoid conflicting file writes

I’ve sketched out a rough algorithm to avoid overwriting the same collection of .ttl files from either different users or the same user from a different browser tabs: Safe Synchronous Save · Issue #151 · centerofci/data-curator2 · GitHub

I appreciate handling this through a Solid API to provide locks across multiple .ttl files / things would be superior but as I am not aware that’s available yet I was wondering if this would be useful to others instead.

If there’s any bug / obvious improvement that can be made I’d be interested to hear your feedback either here or on the issue. :slight_smile:

1 Like

This is a question near and dear to my heart :slight_smile: I’ve gone through a few editions/versions of attempting to solve this in the context of my own app, and have a solution that is working there. But I’ve been wondering more generally how Solid might help support this problem. I don’t have quite enough time now to address this question but wanted to say “Good question!” :slight_smile: I’ll come back to this later.

1 Like

OK. I’m back :slight_smile: Some thoughts on this:

a. It looks like the use case you are mostly thinking about is making sure that two writes (uploads) of the same resource (e.g., .ttl) don’t happen and overwrite each other. One possibility is the If-None-Match header. Not sure if you’ve looked into that. However, at least when I tried this with the NSS, it didn’t appear to be implemented at least for PUT requests. I’m wondering if what could work in your situation (at least hypothetically, with Solid support) is if the web page had the last known Etag value, and then it made it’s modification HTTP request conditional upon that page still having that same Etag value.

b. Perhaps obviously, the situation gets considerably more complex if one is trying to resolve merge conflicts. My means of dealing with this in my application is that I have a custom server that does two things in this regard: 1) it serializes changes to the specific file in question in user storage, and 2) it uses file-type based conflict resolution algorithms to make the changes. The file types in this application are specialized and designed to always allow conflicts to be resolved without user interaction. To try to be a little more clear: Only a few of the files types in the application actually can be changed (beyond the first version of the file), and their file contents are structured in such a way as to allow conflicts to just not happen.

c. Even more generally, I think it would be great if we as a community had an ongoing discussion about the issue of handling conflicting file writes, conflict resolution, merges etc. in Solid. It might be argued that much of this should be handled by the client. But there might be useful primitives that can be incorporated into Solid that would greatly help the client architecture and ease application development. These primitives may be especially useful when applications are written without having an associated custom server. When you have your own custom server with the application you can take steps like I mention above to serialize changes to specific files. But having your own custom server adds considerable burden to application development.

1 Like

Hi @crspybits you might be interested to know I’ve been looking closely at your point b) for a while :slight_smile:

m-ld is a tech for live-sharing an RDF graph, using CRDTs. I’ve been chatting to various people in the Solid community about it over the last year, with the intention to interest folks in integrating it with Solid. There’s a discussion of the technical details on m-ld’s discussion board here: Integration with solid-community-server · m-ld · Discussion #71 · GitHub.

With regard to point c) there are quite a few scattered mentions of this in the Solid ecosystem. I gathered them up when I was putting together an NGI proposal last year, which you can read here: Integration with the Solid Project · Issue #62 · m-ld/m-ld-spec · GitHub

1 Like

This is very exciting @gsvarovsky!!! I will read though your pages in more detail. I imagine I’ll learn about this shortly in my reading but how do the peers in your m-ld system learn about each other? Is there a registry service? It sounds like all data shared by peers in the system is of RDF type, is that right?

Have you given any presentation(s) at Solid World yet?

CRDT’s only very loosely informed my work, but they interest me very much. I didn’t mention them because I didn’t want people to think I was using them :). My ChangeResolvers are somewhat a weak cousin of CRDT’s.

I look forward to more conversation on this topic :slight_smile: :slight_smile:

1 Like

Awesome, @crspybits

how do the peers in your m-ld system learn about each other?

They rely on whatever is used as the “remotes” implementation to tell them. For example, MQTT uses retained messages on a channel; socket.io uses a ‘room’.

all data shared by peers in the system is of RDF type

That’s correct. The main API is entirely JSON-LD, but just recently I’ve been exposing some raw RDF/JS methods too, for the RDF ecosystem.

Have you given any presentation(s) at Solid World yet?

I probably should…

My ChangeResolvers are somewhat a weak cousin of CRDT’s.

On quick inspection they seem like a nice abstraction. Since they demand conflict-free merges, I guess you could say they are by definition CRDTs!

1 Like

I’m thinking this is actually a specific implementation issue that uses files to store data and is not thread safe? With trinpod, it’s not possible to have a conflicting file write as we use both a graphdb where each transaction is thread safe for a specific pod (named graph) (such that two users can’t commit at exactly the same time, one would have to wait), and then files are saved as versions, each version having a unique id. So if two people wrote to the same file at the same time, the person who reached the server first would write a version, and then when that was done, the second person would write a version, and both versions would be available in that file history, where there is basically a master URI for the file node in the graph. (nothing is ever deleted in trinpod)

We are using a great incredibly scalable open source db: blazegraph (used by Amazon Neptune, Alexa, etc)

Thanks @gibsonf1

This post is about avoiding conflicts when writing multiple files:

Does trinpod provide anything to help with that?

On a side note, saving multiple versions of the file sounds really useful but also like it could increase the memory storage requirements a lot? For example, a modification of the sync algorithm above that I didn’t mention is one where you client is continuously reaffirming that it has the latest claim to write to the files (i.e. constantly updating the modified datetime) so that any new client joining would know there was a “primary” user. If all these versions of the file were saved I think this could amount to perhaps 10s of megabytes with in a few days of leaving a browser tab open?

Yes, the same mechanism is used to avoid conflicts in writing anything from any number of people. We simply version everything and have a mutex lock in place for any write to a specific pod causing other parallel writes to wait until the lock opens for their write.

Good to know. Thank you. Would it be possible to get a lock around the following code:

let thing1 = createThing({ name: "1" })
let thing2 = createThing({ name: "2" })

const TITLE = "http://example.com/schema/v1/title"
thing2 = addStringNoLocale(thing2, TITLE, "In dataset1 an item of id 1 exists")

let dataset1 = createSolidDataset()
dataset1 = setThing(dataset1, thing1)
let dataset2 = createSolidDataset()
dataset2 = setThing(dataset2, thing2)

// get mutex lock
await saveSolidDatasetAt(dataset_1_URL, dataset1, { fetch: solid_fetch })
await saveSolidDatasetAt(dataset_2_URL, dataset2, { fetch: solid_fetch })
// release mutex lock

Otherwise you can have inconsistent data from race condition read/writes. For example, one application being used by a user might save dataset1 and before the application can save dataset2, a second application loads dataset 1 and 2, finds no reference in dataset 2 to items in dataset 1, will think it’s ok to delete the item of id 1 in dataset 1 and writes to dataset1. I assume from the mutex you’ve describe this would not protect against that as the mutex is only around a single call to saveSolidDatasetAt?