Thank you for this blog post.
As you describe I’m also facing a limitation with documents when developing interoperable SOLID apps.
From a “database” point of view, documents seems to not be flexible enough: sooner or later we have an issue on how we organize an atomic piece of data that fits to all the use cases.
But if we thing in term of documents, we can write document for particular purposes/use cases. This would lead us to duplicate “atomic” data at some point.
To avoid data duplication, we could imagine a POD server that would implement your hybrid graph as the data storage while keeping serving SOLID compliant documents. The two designs are not necessary opposed.
Thanks for posting this Ruben, very interesting :).
the first app to sculpt documents and containers in your pod determines where other apps need to look for data.
I agree that this is what’s happening in practice today, but I don’t think that’s the idea of Solid. Ideally, apps should be using the Type Index (or Solid Application Interoperability, etc.). But they don’t, so I think the problem is that most apps are not following best practices, not that this is how Solid is supposed to work.
I see two ways to solve that. The first one is to actually enforce them, but I worry that it would increase the barrier of entry for new developers. The second one is to make it so trivially easy to use that everyone does it; either because it’s a well-documented best practice or because most libraries in the ecosystem do it out of the box.
And maybe the second one would be the best approach. I like to think of this like how lenient HTML is with mistakes. You can write HTML that is not 100% valid, but the browser will still render it, instead of throwing an error. In that regard, I think it would be fine if 90% of Solid Apps used the Type Index but not all of them. The problem is that at the moment, almost none are using it.
This seriously hinders interoperability and serendipitous reuse, which I consider absolutely vital to a thriving personal data ecosystem.
I agree with this 100% :D. I wrote a post a while ago titled Interoperable Serendipity, and for me this is THE point of Solid.
Implementing graph-centric pods and views
My opinion of graph-centric pods is that they sound great in theory, but I worry that this will never happen in practice :/. I’ve been developing Solid Apps for 4 years now, and I’m still missing things as simple as paginating data.
Sure, I’m saying that from an impatient app developer’s point of view; I understand that things take time. But honestly, I’d much rather have a document-centric Solid POD that works in practice, than being caught up in theoretical discussions that never become a reality.
As an app developer, I also had this question of where I should place the data. But ever since I learned about the Type Index, I don’t have it any more. And even though it’s not perfect, it gets the job done for many use cases. Still, nobody is using it because it’s still a draft and it seems we’re waiting for the perfect solution before settling on anything.
As you mention in the article, a graph-centric POD is backwards compatible with a document-centric POD. So I think at this point it’s more useful to develop the existing document-centric architecture further, and transition it towards a graph-centric architecture eventually.
In any case, I’m just a mere app developer and I haven’t thought deeply about these topics, so I may be missing something for sure . But my impression is that such a paradigm shift would slow down the progress of Solid even more than it already is.
It seems a lot is already being said on this topic.
Here are a few cents from my perspective (as a datascientist, not someone who understands all the technical detail of the current specs, code and agreements).
The use and re-use of data (including writing, editing, adding, deleting), should be possible by different apps, different people (according to ACL) and on the most granular level possible.
While I see shapes and patterns and fragments as interesting tools, There will be data in your pod that will be “all over the place”.
To some I wish to share my date of birth, to others, I wish to share only the day and month.
To some, I wish to share some info from one document, and some info of another document, but not all.
I might want to query ‘relevant’ information of people “relating cars”, regardless of the ontology or SHex or whatever format they have written it.
Interoperability needs the open world assumption to hold.
Representations of reality need to be open to change (and preferably have a probabilistic reference in terms of truth value, but that can be done in rdf), as not only reality changes, the representation can and will change over time.
This in mind, I think that all triples should be stored in a triple store (or quad store). Including the many possible defined views and relations.
A knowledge graph is the strongest way to do this, and offers a lot of flexibility.
Additionally, I think that (preferably in the near future), we need to introduce vector embedding of nodes & ontologies.
When using NLP techniques, you can map “foaf:person” on “vcard:person” or “schema:person” to one another, as the difference between them might not interest me, and the meaning is too similar to me to make that distinction, or that those “near alternatives” correlate sufficiently for my data usage purpose.
(Note that this takes Wittgenstein’s Family relationship into account).
I don’t see how all of this will ever be possible as long as you keep the graph cut up into documents with an arbitrary (and thus wildly differing) structure.
Ideally, I give permission to a query (of course, in a human readable way): for example: I want to share or query data that is relevant for buying new shoes :
Asking for my favourite colour is fine, as is my buying history of shoes and even trousers, asking for my educational background is not, asking for the amount of steps I do in a day might be interesting (how do I use or wear my footwear).
Now, a lot of these concerns can be pushed off for a little while, until adoption increases and more apps and use cases are developed for a broadening community… But it will become a huge problem when Solid becomes ubiquitous, and as I assume the growth is exponential, we shouldn’t wait too long.
Starting to clear the grounds to prepare for this and make a first working version would be a boon.
I’ve translated this to Chinese https://zhuanlan.zhihu.com/p/596655931
I’ve been a while not participating SoLiD related development, this is a good reentry point for me to access new concepts like TypeIndex and ShapeTree …
But it seems will require a few years before we can use the new graph based solution? If there are view definition language and a view processor to develop in the SoLiD servers.
I love the ideas in the article! Elegant way to resolve Solid’s ongoing issues.
It’s the direction i’ve been hoping for, but have been too shy to ask.
((( link to unfinished article, maybe don’t go beyond intro )))
When i read these ideas clearly explained by one of the Solid’s leading thinkers-influencers @RubenVerborgh , it gives me a boost of optimism for Solid. I hope we pick this direction asap.
As Solid app developer, I’ve failed to deal — in my thoughts — with the issues described in the article (rigid discovery, permissions). It caused me anxiety about investing energy into Solid. I thought it was doomed to fail due to its limits. Furthermore, thoughts about including Shape Trees in the specification (thus enforcing rigid document structure) gave me further reasons to despair. (Sorry interoperability panel…)
This approach solves a major part of my issues!
As app developer, i don’t care for documents at all. I need to CRUD data. I want to provide an expected shape to the data storage, and receive subgraph that matches the shape and user’s permissions. Some form of query engine seems necessary (LDF, SPARQL)
Recently i’ve enjoyed ideas behind shape repo and LDO by @jaxoncreed. Graphs and shapes - yay!
I agree with @NoelDeMartin - to focus on making things work.
Yet i hope that we make this work: Graph-centered pods with triple-and-blob-level permissions, and with query endpoint(s). (Great to read that it’s not in conflict with document approach. Thanks article!)
I also wish Solid community would take more practical, demo-first approach to writing specs. (developers of solid unite! )
Until then, i’m also going for type indexes, and painful dealing with documents and
I find (knowledge-)graphs, and basic RDF, and data shapes super intuitive. I think everybody can. (think colorful dots and arrows). The difficulties start when we try to express them in linear text documents. Not sure why documents should be more intuitive for devs. We work with databases, structures and queries all the time.
(ok, let’s not start about how expressive & universal, or not, RDF is. maybe it’s good enough, idk, let’s find out in practice…)
I’ll end the rant here…
Thanks for reading!
We (team PDS Interop) did some tinkering and had an idea about views and transformers that could maybe help in getting better access to data and still maintain for normal users a filesystem/document view.
A view could be a virtual document that can give data to applications. Applications would need to ask for the data they need and then the server can see if it has a folder/document or maybe a virtual view. The view can limit the data and manage the permissions. A view works in close relationship with transformers that can translaten data to the data as requested by an application.
Here we have posted some additional context: Solid: different views on Linked Data
We kind of like the filestore. We need to have data on disk anyways. We agree that Ruben is right when he says we have issues with data-access and accessing data that is stored very differently. Let us know waht you think.
See also the ML thread about this same topic: Re: Detailed response to Ruben's blog from Melvin Carvalho on 2023-01-13 (email@example.com from January 2023)
I also included this “graph store” view as one of the various “flavours of Solid” in Using the Flavours of Solid
Let’s take a simple example:
(foaf:Person) --foaf:knows--> (foaf:Person)
you may have friends that you want to:
- share publicly
- keep private
- share with your family
- share with your friends
- share within a specific online community
- share with Alice
- share with Bob
With the current architecture, every group of
foaf:knows will need to be stored in a separate document with its own access list, to keep access accurate. When you want to share a contact with your family and with Alice, you have two choices: Duplicate the data (ouch) or create a new document for just this one contact (ouch ouch).
Then, to reach them, you’ll have to keep track of all these documents in type index, or make sure you can reach them by following your nose.
And that’s a simple example. What happens when we deal with anything more complex?
Can you see how this naturally forces you to think in terms of triples (and knowledge graph); not in terms of documents?
Instead, imagine you could
- specify access down to each triple, if you need
- ask your pod, or Cecilia’s pod questions like Give me all Cecilia’s friends (that i can access), or Give me all recipes (that i can access); e.g. with Linked Data Fragments
With this, the above issues will disappear. But also a need for type indexes, shape trees, and other increasingly complex, or rigid, or unmanageable (client-to-client) specifications. We’ll only need to agree about shapes, not about document structures.
This is not the current Solid, but this is a future that i want, and future (imo) the article points towards.
Folder hierarchy, json document, tabular format, graph, they are all views of the underlying data that makes it easier on the user or the person operating on the data. They are simply different ways to index and represent underlying data. You can go one level down and represent the data in schema agnostic way. CosmosDB, by Azure, made that choice. It’s really an interesting read Schema-Agnostic Indexing with Azure Cosmos DB.
In Cosmos DB, a container can be projected as a “collection, table, graph” for supporting APIs for “SQL, Table, document, Gremlin (graph)”. Easy to follow video explanation https://www.youtube.com/watch?v=luWFgTP0IL4.
The underlying representation atom-record-sequence is proprietary, though
Do we have any solution on npm to address this issue?
I read this article when it was published and it was very enlightening, thanks @RubenVerborgh !
There would be much to say about this subject, but here are a few comments already:
- I think we should really stop thinking of pods in analogy to Google Drive or Dropbox. End users aren’t interested in being able to move Turtle files into containers. This is well explained in this document on Solid’s interoperability problems and goals: People shouldn’t need to think about how to physically organize their data to use Solid. Unfortunately, on this same document, it says “[Users] should have as much freedom as they like to make and modify how they categorize their data”. To me, that’s not a requirement at all! We could eventually allow users to upload binaries to their Pods and move them around just like on a drive. But other solutions do it better, and IMO we shouldn’t waste time with this kind of functionality.
- In the ActivityPods project I’m working on, I use a triple store (Jena Fuseki) for data storage. When you’re working with a real knowledge graph, most of the problems Solid encounters with file hierarchies simply don’t exist. Yesterday, I reread the ShapeTree specification and simply couldn’t understand why I would need it for ActivityPods. Knowledge graphs and file hierarchies are completely different worlds, and trying to make them coincide is bound to create a lot of headaches, as Ruben’s article suggests. (Disclaimer: ActivityPods follows the Solid philosophy, but for several reasons I won’t go into here, compatibility with the entire Solid specification is not a priority for us, even though we use LDP, WAC and other Solid standards).
- To start moving away from the “Pod as Drive” philosophy, in the next version of ActivityPods, resource URIs will no longer refer to the LDP container. Instead of having pretty URIs like
https://mypod.store/alice/data/events/my-birthday-event, we’ll simply have
https://mypod.store/alice/data/r1dsc5sd5vvfv5e8gb86wd. It’s a little less pleasant for developers, but it improves privacy (you can’t guess a resource’s content from its URL) and, above all, it makes it easy to move resources to other containers, without affecting their URI.
I’m admittedly not a regular user, but I use my Solid pod on a daily basis and the main reason I still find the file system analogy useful is being able to use the container hierarchy and unit of a document to think about permissions and trust. The idea of specifying this per object/resource or per triple terrifies me and I haven’t yet found an alternative metaphor to the file system/document model that rings true to me.
I’ve been experimenting very informally with the event sourcing metaphor to deal with the issues described - thinking in terms of flow of triples between documents seems quite powerful to me in an era of feeds of different sorts. I can have triples flow across trust and permission boundaries in predictable ways rather than thinking about setting permissions on them in a more granular way.
One of the things that’s drives me crazy with OneDrive etc. is how frequently I need to manually specify permissions for each individual document I share.
I feel like I have very poor visibility of permissions, and I end up with a lot of security by obscurity. Unless there are high risks, I make documents publicly accessible and even editable and just hope that the obscure uri means there’s a low chance of someone finding it.
When faced with this type of event uri, I would be very likely to continue this behaviour. I wonder if you already have any thoughts for mitigations on this?
Using URIs like
https://mypod.store/alice/data/r1dsc5sd5vvfv5e8gb86wd change nothing about permissions. Because you can still put this resource in a LDP container (named for example
https://mypod.store/alice/data/events), and the default permissions of this container will still apply to all the resources it contains, no matter their URIs. WAC permissions don’t care about URIs, what matters is that the resource is linked to the container via the
Triple-level permissions is another subject entirely. The current WAC specs don’t support this, as it knows only about LDP containers and LDP resources. To handle it technically, we would probably need to use RDF* (RDF star), which allow to describe triples with other triples.