Search on large number of Solid pods

Hello, I am new here. I have also recently discovered solid project so forgive me for my ignorance.

I haven’t gone through everything yet, however this question bugs me so i want to find a quick answer.

Think of millions of solid pods and I have access to all and I want to query them for a specific data. Is this possible? What methods are used? Is it performant?

Thank you in advance,

@lecoqlibre has starting doing research on this topic, maybe he can help you.

Welcome @majesty,

Sure, you can make SPARQL requests using Comunica for instance (client or server side).

The performance depends on what you are searching for… To get fast results, yours PODs should expose some indexes [1] which will allow to select the relevant PODs (some PODs may not contain any relevant data). You will have to run your SPARQL query on these indexes.

You could get the first results in few seconds as Comunica uses streams.

[1] should be defined in a Solid client-to-client standard.


Throwing in my 2 cents on this: Personally, I don’t think that Solid scales well to centrally use data of millions of pods. I don’t know how you’d do this, and I haven’t seen it in practice either. It doesn’t seem built for these use cases in my honest opinion.

If you haven’t downloaded the data, then I would say that it is not performant. If you really require the data from millions of pods, you will need to do millions of requests, at least one per pod. If you don’t know in advance where the data is on the pod, or want to compile it from multiple files (and SPARQL is not supported server-side, it’s not part of the server specification afaik), then even more per pod.

If you and the users are fine with it, you could keep a local copy of the data and update it when the data in the pods change. There’s a notification specification that you could use for this, subscribing to changes. Assuming the pods properly implement this (and retry to post events on connection errors), I think this is a reasonable use case. However, I’m not sure on the exact workings, so it may be that this has limits too (maybe it would require too many connections being open at the same time? Not sure).

thanks for the inputs.

think about an instagram clone implemented with pods. I also highly doubt search for it will be performant.

I am thinking of ways to get it better. Maybe a centralized database also caching some data in queues driven by the tags.

Maybe i can have some priority for the people that are physically close proximity. Hmm location based databases that store some of those at the same time they store to a pod?

think about an instagram clone implemented with pods. I also highly doubt search for it will be performant.

Just a thought…

We’ve become accustomed to using websites that are gigantic public databases. There are currently 1 billion videos on YouTube, and you can search for them in a matter of seconds. But am I really interested in a cat video of someone living on the other side of the planet? Or the Facebook event proposed by someone living 1,000km away?

For me, the re-decentralization of the web also means getting back to human proportions, and returning to a logic closer to word-of-mouth. We don’t need to know everything. What counts is knowing what’s going on near us, or what our friends and contacts are up to.

So for me, there’s no point in creating an Instagram clone. On the other hand, thinking up innovative applications that make visible what’s going on in our close network seems more motivating to me. And technically more feasible with Solid-like architectures.