Browsing larg-ish lists - best representation?

I am thinking a bit ahead of myself here, but in my attempt to build a logbook with Solid, I am starting wonder about the best way to represent/fetch a list of log book entries.

The use case is:

  • As a user I can se a list of my logbook entries ordered chronologically such that I can see what I have been doing.

  • As a user I can filter my list of logbook entries by, for instance, the location they are releated to (identified by a URL), such that I can see all entries related to, for instance, “Copenhagen”.

  • As a user I can add a new logbook entry where I specify date, location and comment.

My immediate approach is to follow the “bug tracker” example at https://www.w3.org/TR/ldp-primer/ - having a single container for all the entries and one sub-resource for each entry.

Reading would then be done using “globbing” as described here https://github.com/solid/solid-spec/blob/master/api-rest.md#globbing-inlining-on-get.

But what happens when I have entered my first few hundred or maybe thousand logentries?

  • Is there anyway to do paging (selecting only a subset of N entries at a time)?

  • What happens if I read the logbook container? How does it perform? If I use the data browser, I can see that the logbook container has a “ldp:contains” statement for each item in the container. Does that mean reading data about the container involves returning hundreds of statements that are totally irrellevant?

Related question: is there a way to ask the server to create a opaque and unique name for the entry resources - like an auto incremented ID in a database? Otherwise it is completely left up to the client to create a unique name - and to avoid collisions that would mean adding something like a GUID to the resource name.

3 Likes

I would think you need to consider the underlying storage implementation here, which for node-solid-server is a filesystem.

So my instinct is to limit the number of entries per container (directory) using a suitable scheme. If these are likely to be referred to by URI or manually browsed at any time, an obvious on way to divide them up would be by year, then month, and so on until you have the smallest container with a manageable number of entries in it. But you could use an arbitrary scheme for subdivision.

If the underlying storage where a relational database, or a triple store, none of that would make sense, and a different approach would be needed.

Those are just back of the envelope thoughts though. I imagine it will be a common pattern, so useful to hear ideas and experiences.

to avoid collisions that would mean adding something like a GUID to the resource name

A GUID might be overkill. When I forked solid-plume it just used the title of the blog post which of course might be duplicated, so I tagged on the integer datetime at creation.

Exactly my thoughts :wink: