Query things in solid pod

Hi guys,

I am working on a small school project where I am setting up my own Solid Pod server using Community Solid Server and building a simple app to interact with data. In my project, each user has their own pod (just one) that contains medical data. The structure of the data is as follows:

rootPod

  • Medicines (container)
    • Medicine1 (Dataset)
      • Name (thing)
      • Warning
      • Country…
    • Medicine2 (Dataset)
      • Name (thing)
      • Warning
      • Country…
  • Vital signs (container)…

I have been using the documentation provided at Structured Data (RDF Resources) — Inrupt JavaScript Client Libraries to interact with the data. However, I have only been able to retrieve data if I know the exact name of the thing, such as Medicine1 or Medicine2, by using their respective URLs.

I am wondering if it is possible to query the data (thing) in the Solid Pod based on user input. For example, I would like to retrieve or update the medicine that has the attribute ‘Country’ set to ‘Vietnam’ and the ‘Warning’ attribute starting with "Don’t use ".

Please let me know if there is a way to achieve this functionality.

Thank you!

1 Like

I’d recommend taking a look at this thread and the discussion around it Is a Solid pod a set of documents—or is it a knowledge graph?.

1 Like

Do you store your data as RDF? If yes, the structure of your knowledge graph may matter equally or more than your document structure.

You may need to be somewhat familiar with linked data and knowledge graphs first…

Solid doesn’t have any query API standardized. No SPARQL, nor Linked data fragments endpoints that you could count on. So while you can probably make or extend Solid Pods with such functionality, in general you can’t rely on its existence.

Closest we have to this are type indexes. Special documents in which we keep info about what kind of data can be found in what locations. (You or your app has to keep it manually updated and in sync, which is a bit troublesome.) For example you would say in that document, my data about medicines are in here and here and here. That may be a viable starting point for your search.

To answer such question you would need to have a list of your countries somewhere, and find Vietnam among them. (it will have its own uri). Perhaps in the same document or in some linked document you could have relations (Vietnam uri) -- isCountryOfMedicine --> (medicine uri). If you fetch each medicine’s uri, you could locate all its warnings like (medicine uri) -- hasWarning --> (warning uri), (warning uri) -- hasDescription --> "description", and filter all fetched descriptions for the string you want, and thus answer the question.
I think this way of querying is called to follow your nose. It’s much slower than having a single database with indexes like with SQL.

If each user doesn’t have much data, you may fetch everything from medicine folders first, and query in client.

There are people who made much more in-depth thinking about these questions than me, for example @RubenVerborgh, @rubensworks and others. The website of Ruben Verborgh has interesting reading. There was article and talk about generalized querying over distributed linked data on that website and on Solid World, but i can’t locate it right now. Maybe somebody else can provide links? :slightly_smiling_face:

With this approach, most of the querying work will be done by the client - your app + libraries like Comunica or LDO or rdflib. Not on the solid pod, which is pretty dummy in this regard.


If you have big data and these approaches are not satisfying, you could consider building a separate service for indexing. You could keep track of your medicine relationships there, or of fulltext of your warnings… and they would help you locate documents with more details on Solid Pods. but i’m not sure. Maybe somebody else can enlighten us more about this…


You may also use multiple approaches together.

In any case, you can start with well-structured knowledge graph with linked data.


it would be lovely to have some standardized querying api in solid pods (i root for LDF), but we don’t, and it is what it is for now…

2 Likes

You could possibly use the Comunica SPARQL query engine, which is also a nodejs module, to query the datasets, as long as they are in an RDF-like formatted document. Another thing I am unsure about is your document structure: if medicine1 and medicine2 are both Things of the same type (structure of fields), then they could all belong in the same file (dataset) and would be more easily query-able using a query engine. If the medicines all need to be in separate files for security purposes, then disregard above.

1 Like

Comunica may indeed be helpful for your use case.
This Comunica engine is able to query across multiple documents within and across Solid pods, with document discovery happening at query-time.

2 Likes

Hi @tranbau !

One way of achieving what you describe using the @inrupt/solid-client library is to use the getThingAll method, which will return an array of all the Thing instances in the SolidDataset you fetched. You can then filter on the attributes you are interested in:

const vietnameseMedecines = getThingAll(mydataset)
  .filter((thing) => 
    getStringWithLocaleAll(thing, "https://example.org/ns#Country", "en").includes("Vietnam")
);

This would be a simplistic query mechanism native to the Javascript API, much less expressive than SPARQL, but it can get you started for simple queries such as the ones you described. This still requires to know which resource your data is located in, and in that sense a lot of the suggestions made in this thread very much apply, but in the short term that can get you sorted out.

1 Like

This means I will be querying on the client-side? I’m afraid this approach may result in slow data retrieval.

Thank you. I will review its documentation and give it a try.

Thank you. I will review its documentation and give it a try. My teacher wants me to use FHIR to store medical data. If I want to store information about a medication and use a FHIR form, I noticed that there are many nested attributes that require creating new entities and adding URLs to the main medication entity.

Thank you. I will take a look at it and search for some helpful solutions.

Thank you for your thorough response.

Actually, my task involves transferring data from a SQL database (consisting of approximately 6 tables) and storing it in a pod server that adheres to the FHIR medical standard. Additionally, I need to create alternative APIs or a service that can replace the list of APIs used to query data from the SQL database, enabling data retrieval from the pod.

Here are the steps I have completed so far:

  1. I extracted the data from the SQL database and selected the corresponding FHIR resource (data model) that captures the same information. For instance, I chose the “Patient” resource in FHIR, which has a JSON format resembling the following example:

{
“resourceType”: “Patient”,
“name”: {
“familyName”: {
“v”: “abc”
}
}
}
2. I converted the entire vocabulary to RDF format and stored it in the pod using the solid-client library. However, I encountered an issue with the community solid server, as it doesn’t support nested objects or blank nodes. As a workaround, I had to create three separate entities: “patient,” “name,” and “familyName,” and store them within a dataset.
3. Currently, my most challenging task is to convert the APIs that were originally used to retrieve data from the SQL server into APIs or a service that can query and transmit data from the pod back to the client application.

The actual server-to-client transfer will be at the dataset granularity, once it is in memory going through Thing instances individually should be reasonably quick, depending on the size of the dataset of course.

If your data is scattered across multiple resources, you are correct, this approach may lead to performance issues, and you may get better results out of querying with Comunica. However, you may also structure the data in a way that works better with your purpose, for instance having all medicines in one dataset rather than split up in individual resources. That would change how you can structure access permissions, but if there is no distinction in access permission between medicines that would not be an issue, and it would reduce the network traffic significantly.

3 Likes

Considering it is only text data, I think you should retrieve it all, manipulate it in the browser/local and write it back if you need to change data.

Don’t worry about transfer size, it is only text.

You do have to rethink the frontend/client. Data will be local in memory. It does make browsing through the data very fast.

1 Like

Solid noob here as well. This thread is reminding me of my bucket list of things I would want in Solid to make it meet my needs, before I could use it:

  1. Solid must store data as RDF. I have no need for Documents or Collections of Documents. Such docs can be crafted “on the fly” after the RDF data is retrieved. Then, FHIR/Healthcare RDF could be used natively, with no transformation.
  2. RDF is is queryable with SPARQL. LDP focuses on collections and such, which aren’t what I need. SPARQL-LD would be excellent.
  3. The granularity of consent permissions should be at the most atomic level, which is at the data element element level, not at a document level. I’m not sure Solid has this currently, since it does not store RDF natively.
  4. A built-in way to request access to a pod holder’s data. Consent permissions are then checked against the user’s ID/role.

If I had all these things, I could start implementing Solid, but now I feel that I would need to add on a bunch of Communica modules to get around current Solid limitations.

3 Likes