A Vision for Personal Knowledge Graphs

On the Solid Symposium 2023 I talked about my vision for personal knowledge graphs using Solid. Unfortunately it was not recorded, but meanwhile I managed to record it myself.

On the Fediverse: A Vision for Personal Knowledge Graphs - tchncs
On YouTube: https://www.youtube.com/watch?v=zCtoWkwSkxI

Feel free to spread the word! I also welcome feedback.

6 Likes

Excellent work, Angelo! Do you have a link to the beginning of a knowledge graph implementation of Solid, that meets this vision? I would like to help out. Thank you!

2 Likes

Hi @eric-jahn, thanks for your feedback and your offer to help. I am not sure what you mean by knowledge graph implementation. Solid as it is, already implements the idea of knowlege graphs by using Linked Data. All the app ideas discussed in the talk should be possible to implement with Solid today. Are you interested in building such apps or PoCs of the ideas? That would be awesome!

i could 100% be wrong, but they might be referring to a SOLID-compliant server that uses a knowledge graph on the back end rather than a file system structure. If not, I’m curious as well

@gaz009 You are 100% correct. I just want SOLID connected to a simple knowledge graph, with SPARQL capabilities. The file/document structure just complicates the data structure and limits the simple SPARQL access. Docs can be generated on-the-fly as needed anyway, depending on permissions, but the data shouldn’t be stored in the POD as docs.

1 Like

Whether the data in a Pod forms a knowledge graph is independent of the implementation a server chooses. As far as I know e.g. CSS can work with file system data as well as a graph database. It does not matter in the end, the data conceptionally becomes a knowledge graph and enables the vision I describe in my talk.

Unless my understanding is wrong, I think a knowledge graph backend could have advantages over the standard file system structure.

One example being if I have a triple T stored at a location U in a file system, the triple T could also be stored at U’. I could change the triple T => T’ at either U or U’, but other applications or even myself may not specifically know which one is the most accurate triple.

The knowledge graph implementation could change this by rather than tying a document which has triples inside of it and resides at a location, the triple exists within a knowledge graph, and each URI only contains a set of triples which are accessible from that location. Therefore if I change a triple T at location U to the form T’, the triple in the knowledge graph changes, and if I retrieve this triple T from any URI which points to this triple, I will receive the correctly updated T’ without concerns of versioning.

This obviously raises questions such as how do you handle notifications and conflicting ACL rules regarding each individual triple and representations, but it was a nice thought experiment.

You are mixing several things here. This is my understanding, to bring a bit structure into what we are discussing here

  • knowledge graph: This is the result of putting Linked Data on Solid Pods. It does not have a hierarchical structure, no matter of the backend implementation
  • document-container structure: a way to organize documents in a hierarchy on solid pods, a subset of a knowledge graph
  • document: a http resource that contains triples (i.e. 4th part of a quad)
  • dynamic document: a document that is created on the fly (when requested) with other data from the Pod
  • file system: one potential storage backend for data on Solid Pods (implementation detail)
  • graph database: one potential storage backend for data on Solid Pods (implementation detail)

This starts the philosophical question of whether it is the same triple. What you refer to as location U in a file sytem is in my undestanding the document as defined above. Two triples s p o in two different documents do not necessarilly mean the same, they are two different quads. It could also make sense to change them individually. E.g. the name on my drivers license (a document) might change, while my birth certificated (another document) stays the same, where both contained “the same triple” before.

To decide which is the “most accurate” or appropriate triple you need to consider the document of the triple, so this is not something you can just ignore.

A Pod might also contain dynamic documents, that are derived from other documents. But they are still documents. There are no triples in a vacuum. The knowledge graph is a result of the documents (remember: not files!) and the linked data in them, not the other way arround.

5 Likes

Thank you for explaining the terms you used, it was helpful.

I did also mean using a graph database instead of a filesystem structure, so it was a poor choice of wording to use “knowledge graph” when I was referring to implementation details.

On the philosophical question of “are these triples the same”, that is also a good point I hadn’t considered. Depending on the use case (which is a vague assumption), it is necessary for the location of the document to be different. Though I should clarify that by location I am referring to the URI from where this information is accessible, as where the data is stored is a different question from what a user expects to be retrieved when they request a specific url.

I was instead attempting to make an argument for how a graph database implementation instead of a file system implementation could have benefits over the file system implementation. From my perspective, this could reduce data redundancy by removing the “files” aspect for applications which may want the same subset of triples to perform operations, as some documents may consist of identical triples, with the only difference being the document location/uri.

2 Likes

other applications or even myself may not specifically know which one is the most accurate triple

I’m not sure I believe in the concept that triples have accuracy outside of their context. /foo/bar.ttl#Baz is part of a different context/document/graph than /foo/bop.ttl#Baz. That would be true with either a file or a graph-db backend. You can’t change the URI without changing the meaning since the meaning exists in a context.

I agree with @aveltens that if the server responds identically to a request for a triple it doesn’t matter from an RDF point of view what backend the triple is stored in. The choice file/grahph-db undoubtedly has implications for server performance but shouldn’t make any difference at all to a client.

4 Likes

Does the Knowledge Graph/triplestore implementation in the Community Solid Server still require the use of the Linked Data to be stored within “Documents”?

Documents are part of the Solid Protocol. The only way to retrieve triples is to fetch documents. So from the client perspective the tripes are stored in those documents, no matter of the internal storage.

If a Solid Server does not store quads but only triples how would it know what to include in the response when a document is retrieved?

1 Like

There are two separate questions involved - 1) can a client app retrieve a thing as if it were a document and 2) does the server store the thing in a file. The answer to the first question is yes for any Solid Server that either stores things as files or supports named graphs. The answer to the second questiong is that it varies from implementation to implementation and should have no impact on the first question.

1 Like

@jeffz Thank you for your response. So, it sounds like I need to find a Solid Implementation that does not explicitly store Linked Data in a Document structure (Documents are a limiting and unnecessary abstraction in my opinion), but that can still produce Documents for the sake of the Solid Spec communication protocols (via LDP, I gather).

@aveltens I figure Document relationship tracking can simply be stored as more triples in the Knowledge Graph, leaving quads for something else more useful like namespace federation, or who knows?

Regardless, the ability to work around Documents, from a storage perspective, and having a regular, flexible Knowledge Graph instead, is reigniting my interest in Solid, for sure. Thank you both for your responses on this topic.

So, it sounds like I need to find a Solid Implementation that does not explicitly store Linked Data in a Document structure

Why? If it has a concept of named graphs, it will emit exactly the same things to exactly the same requests regardless of how the data is stored.

The Solid protocol says that regardless of how an RDF thing is stored, it must be made avaliable to clients as, at a minimum, JSON-LD and Turtle. In other words it supplies documents to clients. It does this whether or not it uses files to store things. The question of returning a document is completely indpendent of the question of storing things in files.

2 Likes

Documents are different from files in terms of reading as well. The fact that a given implementation has a Container structure similar to a file system does not in any way mandate that clients use that Container strucutre. One can create a web of RDF relationships that completely ignores the Containment structure.

3 Likes

I think he is simply looking for an alternative backend to the standard filesystem structure. As you said, the data will be retrieved the same so the way to store this information shouldn’t matter. And @eric-jahn may have a business/personal case where storage in a filesystem using documents compared to a dedicated database has more flaws/points of failure for them.

This should be possible, yes. I would consider it an implementation detail of the server. What benefits would you get from it? Is it like updating a triple in one (protocol side) document would implicitly update it in other documents as well or is it about something else? I am not familar with the term namespace federation.

1 Like