Solid backend examples/reference implementations

A small group of us has been working on a Java-based implementation of an LDP server with the intention of making this fully comply with the SOLID specification. It is mostly compliant now, supporting WebACL, WebID and the various LDP container types. It does not support WebID-TLS authentication, but that is coming.

The reason I mention this project is that it defines a set of abstractions to make it possible to write different persistence back-ends for the LDP resources – it’s not tied to a filesystem or even a particular set of technologies for handling persistence. As you can imagine, using a triple store (the reference implementation) for persisting RDF has been simple and straight forward. I have also written a RDBMS-based persistence layer. There was not much complexity there: given that LDP has quite a few constraints on the sorts of DB queries that would be allowable, it was fairly simple to create a schema to support RDF data and LDP structures. Plus, these implementations are synchronous and (typically) imply single-node systems.

But your question is specifically about key-value stores. And yes, we are currently building a Cassandra-based backend, which is strictly key-value in nature. Internally, we store RDF data in NQuads format as a blob of data (even though output is always RDF triples). The named graph is used to separate user-managed data from other sorts of resource data (e.g. audit logs, server managed triples, etc). The NQuad format, while not always as compact as Turtle, is extremely flexible, and it makes it easy to stream RDF back to clients. One thing to mention about key-value stores such as Cassandra, is that, while LDP Basic Containment isn’t too difficult to implement, Indirect Containment is almost a non-starter. (Direct Containment is somewhere in the middle, implementation-wise). Fortunately, SOLID only requires support for Basic Containment, so there are no plans to support Indirect Containers with the Cassandra-based server. Support for Direct Containment is under discussion. A relational database or triple store, on the other hand, makes it not too difficult to implement these other sorts of LDP containment structures.

One other (much more experimental) persistence layer implementation to mention made use of an “RDF-Delta” format. I was particularly interested in keeping track of an RDF resource over time, and this allowed me to model the RDF of a given resource as an append-only journal (i.e., add these quads, remove those quads). Here again, the persistence layer was strictly key-value blobs of RDF (or RDF-Delta) data. This was used in conjunction with the Memento standard to make it possible to retrieve the state of an RDF resource at any, arbitrary point in time. The resource journal – every resource had its own journal – also had a compacted form (i.e. plain NQuads) which was read with any simple GET requests. With this implementation, pretty much everything was asynchronous and it made extensive use of Kafka as an event bus and Apache Spark as a data processing layer. The architecture was considerably more complicated than a simple filesystem or relational database, but it was also extremely fast and could scale out across an arbitrary number of server nodes.

One of the main bits of complexity when dealing with a key-value (or any sort of asynchronous, distributed) persistence store in the context of LDP has to do with managing resource state. For example, what happens when one HTTP client POSTs to a container to create a child resource while another client simultaneously sends a DELETE request to that container?

I hope you find that helpful! Let me know if you would like more details on any of this.

7 Likes