[Post deleted]

[Post deleted]

24 Likes

Thank you for this blog post.

As you describe Iā€™m also facing a limitation with documents when developing interoperable SOLID apps.

From a ā€œdatabaseā€ point of view, documents seems to not be flexible enough: sooner or later we have an issue on how we organize an atomic piece of data that fits to all the use cases.

But if we thing in term of documents, we can write document for particular purposes/use cases. This would lead us to duplicate ā€œatomicā€ data at some point.

To avoid data duplication, we could imagine a POD server that would implement your hybrid graph as the data storage while keeping serving SOLID compliant documents. The two designs are not necessary opposed.

4 Likes

Thanks for posting this Ruben, very interesting :).

the first app to sculpt documents and containers in your pod determines where other apps need to look for data.

I agree that this is whatā€™s happening in practice today, but I donā€™t think thatā€™s the idea of Solid. Ideally, apps should be using the Type Index (or Solid Application Interoperability, etc.). But they donā€™t, so I think the problem is that most apps are not following best practices, not that this is how Solid is supposed to work.

I see two ways to solve that. The first one is to actually enforce them, but I worry that it would increase the barrier of entry for new developers. The second one is to make it so trivially easy to use that everyone does it; either because itā€™s a well-documented best practice or because most libraries in the ecosystem do it out of the box.

And maybe the second one would be the best approach. I like to think of this like how lenient HTML is with mistakes. You can write HTML that is not 100% valid, but the browser will still render it, instead of throwing an error. In that regard, I think it would be fine if 90% of Solid Apps used the Type Index but not all of them. The problem is that at the moment, almost none are using it.

This seriously hinders interoperability and serendipitous reuse, which I consider absolutely vital to a thriving personal data ecosystem.

I agree with this 100% :D. I wrote a post a while ago titled Interoperable Serendipity, and for me this is THE point of Solid.

Implementing graph-centric pods and views

My opinion of graph-centric pods is that they sound great in theory, but I worry that this will never happen in practice :/. Iā€™ve been developing Solid Apps for 4 years now, and Iā€™m still missing things as simple as paginating data.

Sure, Iā€™m saying that from an impatient app developerā€™s point of view; I understand that things take time. But honestly, Iā€™d much rather have a document-centric Solid POD that works in practice, than being caught up in theoretical discussions that never become a reality.

As an app developer, I also had this question of where I should place the data. But ever since I learned about the Type Index, I donā€™t have it any more. And even though itā€™s not perfect, it gets the job done for many use cases. Still, nobody is using it because itā€™s still a draft and it seems weā€™re waiting for the perfect solution before settling on anything.

As you mention in the article, a graph-centric POD is backwards compatible with a document-centric POD. So I think at this point itā€™s more useful to develop the existing document-centric architecture further, and transition it towards a graph-centric architecture eventually.

In any case, Iā€™m just a mere app developer and I havenā€™t thought deeply about these topics, so I may be missing something for sure :sweat_smile:. But my impression is that such a paradigm shift would slow down the progress of Solid even more than it already is.

7 Likes

It seems a lot is already being said on this topic.

Here are a few cents from my perspective (as a datascientist, not someone who understands all the technical detail of the current specs, code and agreements).

The use and re-use of data (including writing, editing, adding, deleting), should be possible by different apps, different people (according to ACL) and on the most granular level possible.

While I see shapes and patterns and fragments as interesting tools, There will be data in your pod that will be ā€œall over the placeā€.

To some I wish to share my date of birth, to others, I wish to share only the day and month.
To some, I wish to share some info from one document, and some info of another document, but not all.
I might want to query ā€˜relevantā€™ information of people ā€œrelating carsā€, regardless of the ontology or SHex or whatever format they have written it.

Interoperability needs the open world assumption to hold.
Representations of reality need to be open to change (and preferably have a probabilistic reference in terms of truth value, but that can be done in rdf), as not only reality changes, the representation can and will change over time.

This in mind, I think that all triples should be stored in a triple store (or quad store). Including the many possible defined views and relations.

A knowledge graph is the strongest way to do this, and offers a lot of flexibility.

Additionally, I think that (preferably in the near future), we need to introduce vector embedding of nodes & ontologies.
When using NLP techniques, you can map ā€œfoaf:personā€ on ā€œvcard:personā€ or ā€œschema:personā€ to one another, as the difference between them might not interest me, and the meaning is too similar to me to make that distinction, or that those ā€œnear alternativesā€ correlate sufficiently for my data usage purpose.
(Note that this takes Wittgensteinā€™s Family relationship into account).

I donā€™t see how all of this will ever be possible as long as you keep the graph cut up into documents with an arbitrary (and thus wildly differing) structure.

Ideally, I give permission to a query (of course, in a human readable way): for example: I want to share or query data that is relevant for buying new shoes :
Asking for my favourite colour is fine, as is my buying history of shoes and even trousers, asking for my educational background is not, asking for the amount of steps I do in a day might be interesting (how do I use or wear my footwear).

Now, a lot of these concerns can be pushed off for a little while, until adoption increases and more apps and use cases are developed for a broadening communityā€¦ But it will become a huge problem when Solid becomes ubiquitous, and as I assume the growth is exponential, we shouldnā€™t wait too long.
Starting to clear the grounds to prepare for this and make a first working version would be a boon.

3 Likes

Iā€™ve translated this to Chinese https://zhuanlan.zhihu.com/p/596655931

Iā€™ve been a while not participating SoLiD related development, this is a good reentry point for me to access new concepts like TypeIndex and ShapeTree ā€¦

But it seems will require a few years before we can use the new graph based solution? If there are view definition language and a view processor to develop in the SoLiD servers.

4 Likes

I love the ideas in the article! Elegant way to resolve Solidā€™s ongoing issues.

Itā€™s the direction iā€™ve been hoping for, but have been too shy to ask.
(((:arrow_up: link to unfinished article, maybe donā€™t go beyond intro :sweat_smile:)))

When i read these ideas clearly explained by one of the Solidā€™s leading thinkers-influencers @RubenVerborgh , it gives me a boost of optimism for Solid. I hope we pick this direction asap. :slight_smile: :crossed_fingers:


rant follows

As Solid app developer, Iā€™ve failed to deal ā€” in my thoughts ā€” with the issues described in the article (rigid discovery, permissions). It caused me anxiety about investing energy into Solid. I thought it was doomed to fail due to its limits. Furthermore, thoughts about including Shape Trees in the specification (thus enforcing rigid document structure) gave me further reasons to despair. (Sorry interoperability panelā€¦)

This approach solves a major part of my issues!

As app developer, i donā€™t care for documents at all. I need to CRUD data. I want to provide an expected shape to the data storage, and receive subgraph that matches the shape and userā€™s permissions. Some form of query engine seems necessary (LDF, SPARQL)
Recently iā€™ve enjoyed ideas behind shape repo and LDO by @jaxoncreed. Graphs and shapes - yay!

I agree with @NoelDeMartin - to focus on making things work.
Yet i hope that we make this work: Graph-centered pods with triple-and-blob-level permissions, and with query endpoint(s). (Great to read that itā€™s not in conflict with document approach. Thanks article!)
I also wish Solid community would take more practical, demo-first approach to writing specs. (developers of solid unite! :muscle:t4: :muscle:t3: :muscle:t5: :muscle:t2: :muscle:t6: :smile:)
Until then, iā€™m also going for type indexes, and painful dealing with documents and granular permissions.

I find (knowledge-)graphs, and basic RDF, and data shapes super intuitive. I think everybody can. (think colorful dots and arrows). The difficulties start when we try to express them in linear text documents. Not sure why documents should be more intuitive for devs. We work with databases, structures and queries all the time.
(ok, letā€™s not start about how expressive & universal, or not, RDF is. maybe itā€™s good enough, idk, letā€™s find out in practiceā€¦)

Iā€™ll end the rant hereā€¦ :sweat_smile:

rant fades


Thanks for reading! :blush:

3 Likes

We (team PDS Interop) did some tinkering and had an idea about views and transformers that could maybe help in getting better access to data and still maintain for normal users a filesystem/document view.

A view could be a virtual document that can give data to applications. Applications would need to ask for the data they need and then the server can see if it has a folder/document or maybe a virtual view. The view can limit the data and manage the permissions. A view works in close relationship with transformers that can translaten data to the data as requested by an application.

Here we have posted some additional context: Solid: different views on Linked Data

We kind of like the filestore. We need to have data on disk anyways. We agree that Ruben is right when he says we have issues with data-access and accessing data that is stored very differently. Let us know waht you think.

3 Likes

See also the ML thread about this same topic: Re: Detailed response to Ruben's blog from Melvin Carvalho on 2023-01-13 (public-solid@w3.org from January 2023)

I also included this ā€œgraph storeā€ view as one of the various ā€œflavours of Solidā€ in Using the Flavours of Solid

2 Likes

Letā€™s take a simple example:

(foaf:Person) --foaf:knows--> (foaf:Person)

you may have friends that you want to:

  • share publicly
  • keep private
  • share with your family
  • share with your friends
  • share within a specific online community
  • share with Alice
  • share with Bob
  • ā€¦

With the current architecture, every group of foaf:knows will need to be stored in a separate document with its own access list, to keep access accurate. When you want to share a contact with your family and with Alice, you have two choices: Duplicate the data (ouch) or create a new document for just this one contact (ouch ouch).

Then, to reach them, youā€™ll have to keep track of all these documents in type index, or make sure you can reach them by following your nose.

And thatā€™s a simple example. What happens when we deal with anything more complex?

Can you see how this naturally forces you to think in terms of triples (and knowledge graph); not in terms of documents?


Instead, imagine you could

  • specify access down to each triple, if you need
  • ask your pod, or Ceciliaā€™s pod questions like Give me all Ceciliaā€™s friends (that i can access), or Give me all recipes (that i can access); e.g. with Linked Data Fragments

With this, the above issues will disappear. But also a need for type indexes, shape trees, and other increasingly complex, or rigid, or unmanageable (client-to-client) specifications. Weā€™ll only need to agree about shapes, not about document structures.

This is not the current Solid, but this is a future that i want, and future (imo) the article points towards.

2 Likes

Folder hierarchy, json document, tabular format, graph, they are all views of the underlying data that makes it easier on the user or the person operating on the data. They are simply different ways to index and represent underlying data. You can go one level down and represent the data in schema agnostic way. CosmosDB, by Azure, made that choice. Itā€™s really an interesting read Schema-Agnostic Indexing with Azure Cosmos DB.

In Cosmos DB, a container can be projected as a ā€œcollection, table, graphā€ for supporting APIs for ā€œSQL, Table, document, Gremlin (graph)ā€. Easy to follow video explanation https://www.youtube.com/watch?v=luWFgTP0IL4.

The underlying representation atom-record-sequence is proprietary, though

Do we have any solution on npm to address this issue?

1 Like

I read this article when it was published and it was very enlightening, thanks @RubenVerborgh !

There would be much to say about this subject, but here are a few comments already:

  • I think we should really stop thinking of pods in analogy to Google Drive or Dropbox. End users arenā€™t interested in being able to move Turtle files into containers. This is well explained in this document on Solidā€™s interoperability problems and goals: People shouldnā€™t need to think about how to physically organize their data to use Solid. Unfortunately, on this same document, it says ā€œ[Users] should have as much freedom as they like to make and modify how they categorize their dataā€. To me, thatā€™s not a requirement at all! We could eventually allow users to upload binaries to their Pods and move them around just like on a drive. But other solutions do it better, and IMO we shouldnā€™t waste time with this kind of functionality.
  • In the ActivityPods project Iā€™m working on, I use a triple store (Jena Fuseki) for data storage. When youā€™re working with a real knowledge graph, most of the problems Solid encounters with file hierarchies simply donā€™t exist. Yesterday, I reread the ShapeTree specification and simply couldnā€™t understand why I would need it for ActivityPods. Knowledge graphs and file hierarchies are completely different worlds, and trying to make them coincide is bound to create a lot of headaches, as Rubenā€™s article suggests. (Disclaimer: ActivityPods follows the Solid philosophy, but for several reasons I wonā€™t go into here, compatibility with the entire Solid specification is not a priority for us, even though we use LDP, WAC and other Solid standards).
  • To start moving away from the ā€œPod as Driveā€ philosophy, in the next version of ActivityPods, resource URIs will no longer refer to the LDP container. Instead of having pretty URIs like https://mypod.store/alice/data/events/my-birthday-event, weā€™ll simply have https://mypod.store/alice/data/r1dsc5sd5vvfv5e8gb86wd. Itā€™s a little less pleasant for developers, but it improves privacy (you canā€™t guess a resourceā€™s content from its URL) and, above all, it makes it easy to move resources to other containers, without affecting their URI.
2 Likes

Iā€™m admittedly not a regular user, but I use my Solid pod on a daily basis and the main reason I still find the file system analogy useful is being able to use the container hierarchy and unit of a document to think about permissions and trust. The idea of specifying this per object/resource or per triple terrifies me and I havenā€™t yet found an alternative metaphor to the file system/document model that rings true to me.

Iā€™ve been experimenting very informally with the event sourcing metaphor to deal with the issues described - thinking in terms of flow of triples between documents seems quite powerful to me in an era of feeds of different sorts. I can have triples flow across trust and permission boundaries in predictable ways rather than thinking about setting permissions on them in a more granular way.

One of the things thatā€™s drives me crazy with OneDrive etc. is how frequently I need to manually specify permissions for each individual document I share.
I feel like I have very poor visibility of permissions, and I end up with a lot of security by obscurity. Unless there are high risks, I make documents publicly accessible and even editable and just hope that the obscure uri means thereā€™s a low chance of someone finding it.

When faced with this type of event uri, I would be very likely to continue this behaviour. I wonder if you already have any thoughts for mitigations on this?

1 Like

Using URIs like https://mypod.store/alice/data/r1dsc5sd5vvfv5e8gb86wd change nothing about permissions. Because you can still put this resource in a LDP container (named for example https://mypod.store/alice/data/events), and the default permissions of this container will still apply to all the resources it contains, no matter their URIs. WAC permissions donā€™t care about URIs, what matters is that the resource is linked to the container via the ldp:contains predicate.

Triple-level permissions is another subject entirely. The current WAC specs donā€™t support this, as it knows only about LDP containers and LDP resources. To handle it technically, we would probably need to use RDF* (RDF star), which allow to describe triples with other triples.

1 Like