I am a Computer Science Undergraduate at University of Southampton and I have built a POD text indexing and search application for my final year project called PodQuest.
The application creates indexes whilst respecting the access rights of documents, so users can only see information they have access to, in the indexes they have access to.
As a SOLID user, I would greatly appreciate 5 minutes of your time to open the application, search the provided POD’s publicly accessible documents and then fill in a short 5 question anonymous survey.
Only POD owners can index their POD, so this is a request to just test the searching functionality. Multiple indexing schemes are supported, but the application has been restricted to one in order to focus on effectiveness of the overall goal.
The application should pre-fill the POD URL, you do NOT need to use your own pod.
This pod is pre-configured: https://podquest_example.solidcommunity.net/
Application URL:
Survey URL:
University of Southampton ethics approval ERGO: 103367
Looks like great stuff. I invite you to come present your work at the Solid Practioners Group video call. This is an opportunity to spread awareness of your app and to get feedback from active Solid developers. Our next meeting is already booked but if you’d like to present on either the first or third Thursday of May or June, please DM me.
I tried out the app and filled in the survey. It looks great. One problem I noticed is that even when I changed the container to search from / to /profile/, it still only searched / and showed me only things directly in / but not things in /profile/. When I put in /profile/ it did show me the resources in that container but the search found only /README.md, not /profile/card as it should have.
Am I correct in assuming this is a full-text search such that a search for storage would find a predicate like space:storage? If so, it should have showed me both README.md and card.
Hi Jeff, thank you for checking out the app and responding to the survey, I greatly appreciate it.
Indeed, the functionality of the search bar was previously to always search the entire pod, independent of the current location, however this has now been changed to only show results in the current directory tree.
Regarding the /profile/card it seems the indexer was discarding this as it only indexes text files and this does not have a file extension so was being excluded, this has been fixed and re-indexed now.
Indeed this is full text search, so searching for storage should/does return results README.md and card.
Thank you again for your feedback and checking out the application!
Thanks Will, that makes sense. One thing you should be aware of is that card is only one example of a non-standard file extension. AFAIK, the only way to reliably tell the content-type of something is to look at the Content-Type in the header or to find it in the RDF of the container itself (which lists its contents and, at least for some servers, the content-type of those contents).
Hi Jeff, I am interested in presenting in the SOLID practitioners meeting on the first Thursday of May however I cannot figure out how to DM on this forum, apologies! How can I pursue this? - Will
To DM in this forum, click on the user name. This will pop up a box with a “message” button. Another good way to get in touch with the Practitioners is in the Matrix chatroom at https://matrix.to/#/#solid-practitioners:matrix.org .
A change in plans means that our May 1 meeting will be dedicated to electing our chairs and adopting a charter so, if possible, could we go for May 15 instead?
Searching and indexing has been discussed at Practitioners meetings often. You may want to look at the minutes from sessions on search and index:
Hey @will, great to see more people working on searching a Pod, I think this is a really important feature.
I tried your app and was able to find the provided documents based on their textual content, but I ran into some issues indexing my own Pods.
First of all, my main Pod at https://angelo.veltens.org is far to huge to do a full indexing. I waited for several minutes, but it was far far away from being finished. It could help to do partial indexing of e.g. just a specific container.
I tried another Pod with less things in it and the indexes where created, but I still could not find the things I expected to find. I just released my own search feature on PodOS and uploaded a video about it. I would expect the Ice cream recipe shown there to be found as well using PodQuest after creating an index with it, but this was not the case for some reason.
I also cannot debug this, because the indexes are using some cryptic format. In PodOS I am storing index data in plain RDF and build the “real” search index on client side using lunr.js. It’s quite simple but effective. The disadvantages of my approach are, that I only index labels / names, not other texts in the Pod and (at least for now) you have to choose each thing to index manually.
If you are interested in exchanging ideas on all this, let me know!
Thank you for checking out the application, I see your PodOS looks very cool!
Regarding the issue with indexing you raised, after watching your video I can see the file in question does not have a file extension which was causing an issue and I believe excluding the file from being indexed. This has now been fixed and I would greatly appreciate if you could verify this - being able to index and then search your pod.
Regarding not being able to read the indexes, this is because the indexes are stored using a Huffman coding in order to reduce the index sizes significantly and reduce query time (which includes fetching the index from the pod).
Regarding more granular indexes, this is definitely something I have considered and agree is a useful feature however I decided to remove from scope for the current implementation due to time requirements and this being a part of my degree. (Multiple indexing techniques have been implemented on PodQuest and I am evaluating their advantages/disadvantages - however I removed the options and kept only Flexsearch for the hosted version in order to focus on the communities opinions on the functionality offered opposed to the implementation)
Indeed, as you have chosen to only index names and labels it likely supports much larger PODs I suppose. PodQuest currently attempts to index the entirety of all text documents (with tokenization of terms including stemming) as well as its respective access control information, in order to create multiple different indexes for webids with different access rights.
I would greatly appreciate any more feedback you may have on PodQuest!