Solid backend examples/reference implementations

happybeing · October 20, 2018, 11:04am

I’m familiar with the essentials of a Solid platform (necessary to support Solid Pods). That is, LDP storage with some extras such as WebID and Access Control Lists. And I have looked at how node-solid-server achieves this using the file system on a server to host directories of Turtle and other resources as files, alongside access control files.

Are there any other examples, or models of implementation, actual, experimental or discussed?

I’m particularly interested if anyone has tried, or plans to implement the Solid platform on top of a database type structure such as a key/value store, but anything would be of interest.

My interest is in exploring different ways to use the SAFE Network as a backend for Solid. I have implemented a crude (just storage really) Solid platform and demonstrated a Solid app (Solid Plume) running on SAFE Network with minimal modifications to the original app, by emulating the Solid protocol within the browser. For the backend I took the easy way - creating an LDP interface to a file system like API (SAFE NFS) which stored resources, including Turtle files very much like node-solid-server. There is a video and slides presentation of this, including links to the code on the SAFE forum: SAFE Plume demo

Following on from this Maidsafe have been exploring other options and implemented an experimental RDF API which was used to demonstrate a decentralised WedID manager and chat application, all living on SAFE Network, which was shown at DWebSummit 2018: see this SAFE forum topic
(if there’s a better link @bochaco let me know). This ‘proof of concept’ shows a different approach, storing RDF fragments in the key/value based Mutable Data, rather than the approach I used, where the key/value feature is used to store pointers to immutable blobs (corresponding to node-solid-server files).

There’s a topic with more details of how the experimental RDF API stores RDF fragments in the key/value store, so if you want to read about that or ideally give feedback on that see this Solid forum topic.

So to reiterate my question, are there any other examples, models etc. for Solid platform implementation or Linked Data storage, which might help in figuring out other storage backends (not just SAFE)?

bochaco · October 20, 2018, 3:18pm

So this is not stored as an RDF then, right? we are considering that everything should be an RDF, i.e. we think the list of pointers to immutable blobs should also be an RDF.

happybeing · October 20, 2018, 6:11pm

That sentence is describing how I got Solid Plume to work on SAFE. As mentioned, it mirrored how node-solid-server works, ie storing LDP resources as files/blobs, both Turtle and other, arbitrary file types.

acoburn · October 20, 2018, 7:17pm

A small group of us has been working on a Java-based implementation of an LDP server with the intention of making this fully comply with the SOLID specification. It is mostly compliant now, supporting WebACL, WebID and the various LDP container types. It does not support WebID-TLS authentication, but that is coming.

The reason I mention this project is that it defines a set of abstractions to make it possible to write different persistence back-ends for the LDP resources – it’s not tied to a filesystem or even a particular set of technologies for handling persistence. As you can imagine, using a triple store (the reference implementation) for persisting RDF has been simple and straight forward. I have also written a RDBMS-based persistence layer. There was not much complexity there: given that LDP has quite a few constraints on the sorts of DB queries that would be allowable, it was fairly simple to create a schema to support RDF data and LDP structures. Plus, these implementations are synchronous and (typically) imply single-node systems.

But your question is specifically about key-value stores. And yes, we are currently building a Cassandra-based backend, which is strictly key-value in nature. Internally, we store RDF data in NQuads format as a blob of data (even though output is always RDF triples). The named graph is used to separate user-managed data from other sorts of resource data (e.g. audit logs, server managed triples, etc). The NQuad format, while not always as compact as Turtle, is extremely flexible, and it makes it easy to stream RDF back to clients. One thing to mention about key-value stores such as Cassandra, is that, while LDP Basic Containment isn’t too difficult to implement, Indirect Containment is almost a non-starter. (Direct Containment is somewhere in the middle, implementation-wise). Fortunately, SOLID only requires support for Basic Containment, so there are no plans to support Indirect Containers with the Cassandra-based server. Support for Direct Containment is under discussion. A relational database or triple store, on the other hand, makes it not too difficult to implement these other sorts of LDP containment structures.

One other (much more experimental) persistence layer implementation to mention made use of an “RDF-Delta” format. I was particularly interested in keeping track of an RDF resource over time, and this allowed me to model the RDF of a given resource as an append-only journal (i.e., add these quads, remove those quads). Here again, the persistence layer was strictly key-value blobs of RDF (or RDF-Delta) data. This was used in conjunction with the Memento standard to make it possible to retrieve the state of an RDF resource at any, arbitrary point in time. The resource journal – every resource had its own journal – also had a compacted form (i.e. plain NQuads) which was read with any simple GET requests. With this implementation, pretty much everything was asynchronous and it made extensive use of Kafka as an event bus and Apache Spark as a data processing layer. The architecture was considerably more complicated than a simple filesystem or relational database, but it was also extremely fast and could scale out across an arbitrary number of server nodes.

One of the main bits of complexity when dealing with a key-value (or any sort of asynchronous, distributed) persistence store in the context of LDP has to do with managing resource state. For example, what happens when one HTTP client POSTs to a container to create a child resource while another client simultaneously sends a DELETE request to that container?

I hope you find that helpful! Let me know if you would like more details on any of this.

bergos · October 22, 2018, 8:45pm

I was once working on a JavaScript LDP implementation. It’s not complete and it’s outdated, but it could be interesting, from architecture perspective. It’s using RDF-Ext for RDF resources. For blobs, one of the blob-store packages could be used. Now the RDF-Ext stuff should be replaced with the RDFJS packages, but the blob store stuff should be still up-to-date.

A while ago I wrote down some ideas for HDT, a binary RDF serialization. It’s labeled IoT, but I also had IPFS and SAFE in mind.

gist.github.com

https://gist.github.com/bergos/321e0fe458d8a31f41508ba8628c8220

iot-lab-hdt.md

# Adapting HDT for the Internet of Things
 
Linked Data/[RDF](https://www.w3.org/TR/rdf11-concepts/) would be an elegant way to achieve semantic interoperability for the Internet of Things.
RDF does not describe how the data is encoded.
The most common formats [JSON-LD](https://www.w3.org/TR/json-ld/) and [Turtle](https://www.w3.org/TR/turtle/)) use text encoding.
A more efficient binary encoding would be required for the Internet of Things.
[HDT](http://www.rdfhdt.org/) is a binary format, which is already very well adopted for data dumps.
The HDT format is very compact and optional indexes allow fast look-ups.
The current [HDT specification](http://www.rdfhdt.org/hdt-binary-format/) is aligned for the data dump use case.
Also the reference [C++ library](http://www.rdfhdt.org/manual-of-the-c-hdt-library/) implements all features of HDT and could be to big for constrained devices.

This file has been truncated. show original

happybeing · October 22, 2018, 9:00pm

Thank you @acoburn & @bergos these are both what I was hoping for.

rodant · May 8, 2019, 9:01am

Hello Aaron,

I’ve been working on a prototype solid app the last months based on the NSS reference implementation. Now I’d like to evaluate alternatives on the server side and I found your project very appealing concerning scalability and the flexible architecture.

I have a docker container of the last release running locally, the variant without external DB, and I tried to use our prototype app with this server instance. I also switched JWT on in the server config file. When I try to authenticate using the solid auth popup the later fires a GET request for the resource: /.well-known/openid-configuration and the response is “404 not found”. Is some server configuration missing or doesn’t trellis support the solid auth protocol yet?

Kind regards.

acoburn · May 8, 2019, 1:44pm

Hello @rodant,
The Trellis project implements an LDP-based resource server. While Trellis supports authorization (WebAC) according to the Solid specification, it does not implement its own authentication layer. That is, the identity provider is assumed to be entirely external to Trellis. This is a fairly typical pattern with OAuth and OIDC: separating the resource server from the identity provider.

In particular, in the case you describe, the ./.well-known/openid-configuration resource won’t be found on the Trellis server because it is not, itself, an OpenID Connect server.

Best regards, Aaron

rodant · May 8, 2019, 1:52pm

Hi Aaron,

I see. Then one possibility would be to use the NSS server only for authentication (as identity provider) and Trellis as POD provider, right?

Topic		Replies	Views
Solid: File server vs database?	11	1610	December 17, 2021
Solid API? Use Solid	3	925	December 17, 2022
Why "on top of the file-system"? Backend Development	6	2932	June 26, 2019
New to Solid - Compatibility?	9	1274	April 2, 2019
Using SOLID "without" Pods	3	707	March 29, 2021

Solid backend examples/reference implementations

Related topics