Need help store large amount of small data

I’m always interested in store my TiddlyWiki based app’s data to SoLiD, and share with other app.

But an example knowledge base may contains large amount of small file, with their metadata, see my TiddlyWiki Note OS based digital garden for example: wiki/tiddlers at master · linonetwo/wiki · GitHub

Should each of file and metadata store as a single JSONLD file? Or bundle files with simillar metadata (for example based on same tag)? AFAIK if don’t bundle files, we can’t use any RDF related feature, like SPARQL search, and we will only use SoLiD as a BaaS (backend as a service) like minio.

And I hope the calendar data generated from Calendar app runs on TiddlyWiki OS can be read by other app like GitHub - timbot1789/solid-calendar: a lit app built for a hackathon , how can it be possible? An example data is like this in TiddlyWiki:

calendarEntry: yes
caption: 给太微加性能检测
created: 20221225102740620
creator: 林一二
draft.title: 
endDate: 20221225103000000
modified: 20221225102740620
modifier: 林一二
startDate: 20221225083000000
tags: 解决问题
title: 2022-12-25T16:30:00+08:00
type: text/vnd.tiddlywiki

I tried SoLiD at 2019, and sometimes I comeback to see if I can store knowledge base data to it. Could it now?


In my design at 2019, I will store up to 100MB of metadata in a single RDF file, and they hold links to actural resources (text content). I’m not sure if it works now, because my note now will have larger metadata store due to heavy recording and web-clipping.

And I don’t know if there is a way to do these now

  1. partial update based on title field.
  2. quick SPARQL search on server side to retrive partial data
  3. Subscribe RDF file change, and lock free same file editing, for cooperation editing

some features is difficult to use and debug in NSS back in 2019, maybe CSS works out-of-box now?

You may want to get in touch with the Espresso project out of U.Southhampton, U.London, and U.Singapore - they are exploring large-scale Solid search. INRIA in France is also exploring Solid indexing over large numbers of files. Let me know if you need help getting in touch with them.

  1. This should be possible by using n3 patch.
  2. You could probably set up a service to use Comunica for fast SPARQL search on the server side data.
  3. The notifications protocol lets a user subscribe to changes in their resources. As far as lock free same file editing and collaboration, that is done client side through some apps with conflict-free replicated data types. One possible technology to use is M-ld which uses a CRDT engine using JSON-LD which is a Linked Data type.

Thank you Jeff, I remember you helped me a lot in 2019. But my attempt was failed at that time. Hope I can build something useful this time.

During this time I already be come a contributor for TiddlyWiki’s core, and proposed something like Suggestion for all Plugin developers about field name: Let's use Ontology to maintain Interoperability - Plugins - Talk TW to its plugin community. At least some of plugins for TiddlyWiki can interop with each other now.

But I’m not very interested in school projects in your link, usually when students graduated, the repo and npm package will be abandoned, like GitHub - kezike/solid-vc , so as a developer I can’t rely on them. Unless a working stable npm package is there, or creater still reply to the issue on repo.

Thank you too, do you know npm package for these features? I build open sources on my spare time, I can’t build app without npm packages that works out-of-box.

Protocol drafts is basically for people build underlying packages, but sadly I’m still a SoLiD app developer now. Maybe I can dive into it after several successful development.

I think the community solid server has an N3 patch function somewhere in their repo. NSS has an n3 patch but idk how successful it is. Comunica is an NPM package itself. The notifications protocol is a protocol, both CSS and Inrupt have one, though the CSS one is the standardized way while Inrupt’s is a different draft edition.

1 Like

AFAIK, n3-patch works fine on NSS. In addition to ESS and CSS’s notifications, the new PREP notifications standard is another option and an NSS of test of it is in progress.

1 Like

Your main question

Assuming your data is already RDF (e.g. JSON-LD) in your Pod, for now, to support interoperability, you need to perform some sort of (RDF) reasoning in your application. There are many ways to do this, either through logical reasoning engines (e.g. eye, or OWL reasoners) or hard-code the reasoning rules in programming languages.

What that eventually needs to achieve is to add more data (triples) in your RDF (JSON-LD) resource before consumed by a component requiring specific shape of the resource.
I haven’t read the details of timbot1789/solid-calendar, but, for example, if solid-calendar requires a sc:name property of xsd:string type to denote the title of the calendar entry, you need to write a reasoning rule which finds ?s ex:caption ?Title and adds ?s sc:name ?Title (e.g. when finding ex:caption "给太微加性能检测", add sc:name "给太微加性能检测" for the same subject). (Also note the difference in prefixes here I used to illustrate the differences.)
For example, in N3, this rule can be written as:

{ ?n ex:caption ?Title } => { ?n sc:name ?Title }.

(Or, you may want to do it the other way around.)

Whether you want to persist this new resource (with enriched data) into your Pod is a choice for your application. I would recommend doing so.

In terms of future, a better solution is needed. But for now, this is the way you need.

Also, remember your application will need to respect that the resource in user’s Pod can contain additional triples than it understands. This aligns with the ethos of semantic web and related technologies (open-world assumption), but may not be the way how other applications/libraries are implemented.

Two questions to ask yourself first:

  1. What do you mean by “knowledge base”? Is it any conceptual thing that stores knowledge (in structured or unstructured format), or an RDF knowledge base/graph? If it’s the former, Solid won’t provide any support (other than BaaS). If it’s the latter, see the next question.
  2. Do you need the knowledge base stored in the exact same structure, and how performant you need? Unfortunately Solid cannot support everything yet. Either you will have to store the knowledge base using Solid’s storage schema (i.e. LDP), or you will store your knowledge base as a single file thus suffering from performance issues when loading it in your application. (This is why Comunica with cache may help, or a SPARQL backend for Solid service may be helpful.)

Further topics

N3 patch

N3-patch is easy to use directly, but I agree the documentation is not very easy to follow for people not used to read specs.
In fact, N3-patch is just the following:

  1. An HTTP PATCH request, with Content-Type: text/n3, and
  2. Its content is a valid N3 Patch document (containing a query as the where clause, and then modifications through insert and delete), such as this example in the spec.

So you can use the standard JS fetch() function for this, and thus no need for a library.

The reason it’s called N3 Patch is because it uses / complies with N3 language, a superset of Turtle.

Notification

I haven’t used that feature yet, but there is solid-client-notifications.
Maybe @CxRes knows more as I remember he has a lot of experience with this feature.

Cooperative editing

Cooperative editing is a separate topic, as it’s not only related to notifications, but also to the synchronization mechanism. CRDT is a promising (somewhat newer) technology for decentralized collaboration, which I like; Operational Transformation is another (older?) technology requiring a central server.
@gaz009 mentioned m-ld, which is one library for CRDT I looked into recently; another possibility is Soukai, which has an example at here. (I also opened this discussion at Soukai and engaged in this discussion at m-ld which may or may not be useful to you.)

As a summary, m-ld does not use Solid Pod for synchronization and you’ll find issues when storing data to your Pod (because data has state). In principle a storage backend can support Solid Pod, but a) it doesn’t exist yet; and b) there might be performance issue to always serialize CRDT states. Soukai uses Solid Pod to store CRDT history, but you need to design a bit of your synchronization mechanism/implementation within your application.

3 Likes