Live shared Pod Data – making it happen

gsvarovsky · March 5, 2021, 4:53pm

I’ve been working on m-ld, a software component for live information sharing [1]. It uses RDF as its native data representation, and CRDTs for eventual consistency. So far I’ve been focused on making m-ld generally applicable and useful in many architectures, whether RDF-based or not, by working top-down from a JSON API.

I’m aware of the need for convergence though, and I’d like to start a deliberate project to lean-in to Solid with m-ld: to make interoperation with Solid seamless.

This could have lots of upsides, for example working directly or indirectly on:

Multi-collaborator editable Pod data (e.g. for private group data, as an extension of personal data)
Local-first offline editing of Pod data (e.g. [2])
Reliable caching of Pod data at the edge (e.g. [3])
A generally applicable patch distribution model for Pods [4]
An authorisation and identity model for m-ld (currently delegated to the owning app [5])

I have in mind to apply for grant funding for this project. But I think the support of the Solid community is even more important.

Is this timely?

Would anyone like to work on this with me?

Would anyone like to work on this with me if we can get funding?

Any other thoughts?

[1] Topic: A project for live data sharing
[2] Solid FAQ: Is it possible to use Solid offline (at least partially)?
[3] Topic: Constructive criticism from an experienced developer:

[4] GitHub: Directions for patch distribution
[5] m-ld.org/doc/#security

newlegendmedia · March 9, 2021, 2:06pm

Is your prior work available for review?

gsvarovsky · March 9, 2021, 2:14pm

Hi Jeff. Sure:

Topic link above
m-ld.org
m-ld.org · GitHub

Happy to talk through anything. I’m starting to put together the proposal for this project now.

gsvarovsky · March 10, 2021, 10:46am

Thanks to Sarven Capadisli’s prompt on solid/chat, I have scraped more of the Solid community for motivation and prior work on this area. I’m sure the references below are not yet comprehensive, but it’s a start.

Opinion

Support for live collaborative editing of Pod data in the corpus of documented use-cases and requirements is found only in allusion, and not called out as motivating for the Solid specification. However, engineers have made calls for patch-passing data distribution mechanisms, for strong reasons which do include enablement of user features. This is a disconnect.

There is an accelerating trend towards remote collaboration in software, including live collaborative editing becoming table stakes for many kinds of user content; as well as version control with branching and merging, for others. I think engineers are aware of this, but it’s hard to capture feature requirements for new applications that users don’t have in front of them yet. This is especially a problem when other motivations for patch-passing are to do with performance and resilience, things that are only perceived by the user when they’re broken.

Data

solid/user-stories

As a developer, I want to be able to subscribe to a stream of pod events #22 – Open

“Event-driven architectures are very powerful in that they allow to create loosely coupled reactive systems.”
This is marginal as a user story, but refers to other feature requests:
- As a user I want to be able to have an audit log of what happened to my data #12
- As a user I want to be able to restore previous data #10

solid/specification

Standardizing state changes in resources (history, undo, sync) #161 – Open

Ruben V: “but then we should probably have collaborative editing as an issue/use case”

Act on the latest version of the resource state #91 – Open

“This discussion is re-discovering wiki edit conflicts, and source code change management, and ACID-style databasing, among other things.”

What server-based notification support is required? #49 – Open

solid/data-interoperability-panel

Problems and Goals for Interoperability, Collaboration, and Security in a Solid Pod

“data will be manipulated not only by different applications, but also by different people or automated agents.”
Does not mention live or realtime manipulations

w3

Collaboration cases in pwp-ucr, but live collaboration is not specified:

UC 72: “Andreas is working on his first collaborative research paper with a fellow student.”
UC 20: “Tanya and Kelly are collaborating on curriculum for the upcoming school year.”

Annotation cases for ‘live’ data in dpub-annotation-uc:

2.3.5 Recording State of Changing, Online Resources (focuses on representation change)
Further allusions in the following cases

Open Data collaboration in dwbp-ucr:

2.23 UK Open Research Data Forum: “data sharing and collaboration are encouraged, facilitated and rewarded.”
Also Dutch National Centre of Expertise and Repository for Research Data: ‘In opposition to “frozen” data sets, linked data can be qualified as “live” data’

Socialwg/Social API/User stories - W3C Wiki

2.70 Fork from and Request Merge of Remote Content: this is Git-style branch/merge.

josephguillaume · May 30, 2021, 11:34pm

Would it be fair to say that integration of m-ld with solid would involve using the pod as a message queue + persistent clone that would need to be updated by a client?

That seems to be the paradigm you had discussed informing this post? Request for Comments: CRDTish approach to Solid

More generally in the context of the issues you highlighted from solid/specification, in your opinion what specific support from the spec is needed for live data sharing to make sense? Is what’s there currently sufficient?

gsvarovsky · May 31, 2021, 8:31am

Hi Joseph!

It would definitely make sense to persist data in a Pod – it would be a natural division of labour to use m-ld for a live document, and Solid for the long-term data availability.

With my current understanding of Solid I would be very careful about proposing to use a Pod as a persistent message queue. Message queueing is hard to make robust, scalable and performant. As evidence I present Apache Kafka, RabbitMQ, Eclipse Mosquito – all substantial long-term projects dedicated only to that task.

Further, to me it dilutes the value of Solid to have implementation artefacts like operation messages in a Pod. You might be able to hide them away using access control, but they have very different lifecycle needs to the characteristic “personal state” data. So having the right interfaces to manage such data will bloat the Solid spec and complicate the vision of switching apps without switching storage (storage independence).

I did discuss this with Noel and his needs were not for multi-actor editing but for offline support. While these could be boiled down to the same computer science problem, his engineering needs seemed quite tractable to him so he concluded to go ahead with messages-in-Solid.

So that’s a big question, which I was hoping to answer with the project. And the answer will almost certainly be “it depends!” There are lots of possible architectural choices with different trade-offs in liveness, resilience and complexity, which apply not only to m-ld + Solid but also to any CRDT + any personal data store.

I can only say that the lowest-complexity option I can intuitively come up with does not require much that is new from the spec as I understand it. If you want it to be robust and still achieve storage independence, then tying down Solid’s consistency model (e.g. spec ticket and ticket) would certainly help.

gsvarovsky · May 31, 2021, 8:52am

I’m sorry to report that there may be some delay in addressing the ideas in this Topic .

Dear Applicant,

We are sorry to inform you that, after going through the Evaluation process described in the Guide for Applicants (Section 4), your proposal has not been selected to take part in the Support Programme of NGI Pointer.

Your proposal has been evaluated by 2 recognized experts, who assessed the potential of your project. Your proposal failed to pass the overall threshold of 10 points.

Find below the final score and comments provided by those evaluators as feedback.

Final Score of your proposal: 9,5 out of 15 points.

josephguillaume · May 31, 2021, 9:41am

Sorry to hear that! There definitely seems to be a fair bit of work required for solid to offer a credible and usable solution in this space. Hopefully there’ll be other opportunities…

aschrijver · June 10, 2021, 6:17am

That is indeed a bummer @gsvarovsky … so close! Next time I’m sure you’ll make it. I find your project very interesting, and hope to look more closely in near future if time permits.

Wanted to mention some CRDT-related information resources (which I’m sure you are already aware of, since you’ll be presenting together with @pukkamustard on the next NGI event Semantic web and metadata solutions:

PUBLIC DREAM research
DROMEDAR data model specification

They do a lot of research about p2p, CRDT’s and offline-first applications.

gsvarovsky · June 10, 2021, 7:30am

Many thanks Arnold. I have come across the DREAM project before but I have not yet been in contact. There’s no time like the present though… @pukkamustard, it would be great to sync up, so to speak! I’ll DM you on the Dream forum.

aschrijver · June 21, 2021, 6:04am

I created a post at SocialHub in the topic Querying ActivityPub collections where I mentioned Meld as interesting (possible solution?), but I cannot gauge the extent this is true. Maybe you’d like to jump in and provide some more info @gsvarovsky?

Topic		Replies	Views
A project for live data sharing	17	1521	January 21, 2022
Application of CRDTs to Solid	13	3321	December 12, 2021
Request for Comments: CRDTish approach to Solid Build a Solid App	29	3519	September 9, 2023
Want to work with RDF graphs in real-time? Build a Solid App	10	503	November 27, 2023
Constructive criticism from an experienced developer	12	2318	November 9, 2020