This is a post adjacent to Is a Solid pod a set of documents—or is it a knowledge graph?. I did not post it there because I’d like your perspective on my use case and maybe you come to different conclusions than me.
I was thinking about writing a Bookmark storage app similar to pinboard.in or shaarli, as a SOLID app. The use cases are
- Add/Edit/Delete Bookmarks. Each bookmark has a URL, title, optional description and tags. It also can have public/private and read/unread flags.
- Browse through the list of bookmarks, most newly added first
- Search through the bookmarks by using one or more tags (intersection search, i.e. bookmarks must have all tags)
- Social Bookmarking - read public bookmark files of other users, discover new content
I see the following benefits of using SOLID:
- Storage and Authentication backend are already taken care of.
- Data is Machine- and Human-Readable (RDF) and has a universally defined schema.
- Reusability/Interop with other apps - no API needed. Other apps could enrich the existing information (thumbnails, description) or implement different visualizations
- Social bookmarking with decentralized storage, with the app collecting bookmarks from followers/friends
However, with the document-centric SOLID data model of containers and data sets, I see some challenges implementing this application as a pure client-side application that only interacts with the pod:
- In one extreme implementation I put all bookmarks into one data set. That’ll be easy to query with client-side SPARQL, but will consume lots of bandwidth, memory and CPU time and will make the initial load time very slow: my current bookmark collection is about 12MB, think about loading this over a mobile connection, several times a day
- The other extreme is to store each bookmark in a data set. That’ll create lots of small requests (with bad performance implications as long as the server doesn’t support HTTP2) and each reading use case might need its own index file (a URL index for looking up an entry for edit/delete, a “date index” for listing the first 100 bookmarks, a “tag index” to query bookmarks by tags). Creating index files adds more complexity:
- How would I reflect the permissions for bookmarks in my indexes? With a simple public/private permission model, I’d have to keep two different index files for public and private bookmarks, basically multiplying the number of index files by 2.
- Index files might become too large, needing more complex pagination or sharding mechanisms
- The application needs to manually update several index files for each add/edit/delete, ideally in an atomic fashion to avoid a broken index when one write action fails.
- There might be a middle ground (somehow sharding bookmarks into “bucket files”, tailored to the most common use cases so some indexes might not be needed), to balance transferred data and requests, but they would still need indexes with all of their complexity.
All these challenges would go away if the pod had some API (SPARQL or something similar) to query the bookmarks as a graph and get only the data I’m interested in. The app would be much easier to write, at least for my personal bookmark collection, because all the indexing, paginating and data assembling would be taken care of. But if other people wanted to use my app or if I want to implement the “social bookmarking” functionality, the app would require other pods to have the same query functionality.
What do you think?