Social bookmarking as an example where we need queries instead of documents

This is a post adjacent to Is a Solid pod a set of documents—or is it a knowledge graph?. I did not post it there because I’d like your perspective on my use case and maybe you come to different conclusions than me.

I was thinking about writing a Bookmark storage app similar to pinboard.in or shaarli, as a SOLID app. The use cases are

  • Add/Edit/Delete Bookmarks. Each bookmark has a URL, title, optional description and tags. It also can have public/private and read/unread flags.
  • Browse through the list of bookmarks, most newly added first
  • Search through the bookmarks by using one or more tags (intersection search, i.e. bookmarks must have all tags)
  • Social Bookmarking - read public bookmark files of other users, discover new content

I see the following benefits of using SOLID:

  • Storage and Authentication backend are already taken care of.
  • Data is Machine- and Human-Readable (RDF) and has a universally defined schema.
  • Reusability/Interop with other apps - no API needed. Other apps could enrich the existing information (thumbnails, description) or implement different visualizations
  • Social bookmarking with decentralized storage, with the app collecting bookmarks from followers/friends

However, with the document-centric SOLID data model of containers and data sets, I see some challenges implementing this application as a pure client-side application that only interacts with the pod:

  • In one extreme implementation I put all bookmarks into one data set. That’ll be easy to query with client-side SPARQL, but will consume lots of bandwidth, memory and CPU time and will make the initial load time very slow: my current bookmark collection is about 12MB, think about loading this over a mobile connection, several times a day
  • The other extreme is to store each bookmark in a data set. That’ll create lots of small requests (with bad performance implications as long as the server doesn’t support HTTP2) and each reading use case might need its own index file (a URL index for looking up an entry for edit/delete, a “date index” for listing the first 100 bookmarks, a “tag index” to query bookmarks by tags). Creating index files adds more complexity:
    • How would I reflect the permissions for bookmarks in my indexes? With a simple public/private permission model, I’d have to keep two different index files for public and private bookmarks, basically multiplying the number of index files by 2.
    • Index files might become too large, needing more complex pagination or sharding mechanisms
    • The application needs to manually update several index files for each add/edit/delete, ideally in an atomic fashion to avoid a broken index when one write action fails.
  • There might be a middle ground (somehow sharding bookmarks into “bucket files”, tailored to the most common use cases so some indexes might not be needed), to balance transferred data and requests, but they would still need indexes with all of their complexity.

All these challenges would go away if the pod had some API (SPARQL or something similar) to query the bookmarks as a graph and get only the data I’m interested in. The app would be much easier to write, at least for my personal bookmark collection, because all the indexing, paginating and data assembling would be taken care of. But if other people wanted to use my app or if I want to implement the “social bookmarking” functionality, the app would require other pods to have the same query functionality.

What do you think?

4 Likes

Please take a look at my library GitHub - jeff-zucker/linked-bookmarks: Distributed shared bookmarking using linked data which does much of what you ask. I have an editing system for it almost ready including SPARQL querying of bookmarks and bookmark topics.

3 Likes

I agree the developer experience needs work, but in my opinion the document model is still the right one from the point of view of the user maintaining control of their data. A document is a natural unit over which to specify permissions.

While the context of use is slightly different, I find the data publishing priority shield useful: p7 of https://github.com/ddvlanck/Publishing-Base-Registries-As-LDES/raw/master/Linked-Data-Event-Streams.pdf
Query interfaces should come on top of data publishing interfaces, precisely for the reason you point out, that it avoids other pods requiring the same query functionality.

I find the LDES approach to be promising in terms of building in both scale and versioning.
https://woutslabbinck.github.io/LDESinLDP/index.html

I would say that, yes, indexing is the solution to allow collections to scale, and that this is isn’t unacceptably complex using append only data structures and push notifications already built-in/being built within Solid.

2 Likes