Maintaining LLM conversations in Solid

Would it be possible to use Solid as a space for safely maintaining LLM data?

Right now LLMs are using our data (already 1 billion people, I read somewhere) to train its LLMs. This raises privacy concerns (so I’ve posted to r/EFF about preventing commercial use) and interesting opportunities (such as letting these conversations be used by an open-source AI such as Aperture to produce a public good similar to Wikipedia). Solid seems poised as a good technology to maintain that data (I.e. the conversations that we have with AIs, whether they are commercial or not).

I also think that a truthful/rational AI could be trained online as it would be resistant to capture, but that is an added bonus: the use of solid to store chats and the optional lease of that data to train LLMs are the main proposal that I wish this community to consider.

1 Like

@timbl wrote this article a while ago where he describes his vision for how AIs can work with Solid: Charlie works for Bob.

I also really like that idea, but so far I’m not aware of anyone who’s actually working on this with Solid.

Inrupt have mentioned that they have a working version Charlie, but as we discussed in the Inrupt's Data Wallet thread, I don’t think Inrupt wallets are actually Solid. Others may disagree, of course. In which case, I guess you could say that Inrupt is already doing what you’re talking about. But again, I’ve only read that blog post and heard them talk about it, I haven’t seen how that actually works.

2 Likes

Thanks for pointing me to this article. From what I read. Solid has fewer proprietary parts to it, so I’m inclined to use a pod instead of a wallet.

That’ said, I’m currently throwing together a prototype to showcase several things, so perhaps the first iteration will use http cookies..

-alec

There are a number of people working on LLM/Solid interactions. ODI (the Open Data Institute) has at least two AI related projects - one to develop a Solid based MCP and another to use AI as a content-moderation tool for solidcommunity.net. We at Solid Practitioners ( GitHub - solid-contrib/practitioners: A hub for Solid developers ) are planning an upcoming virtual meeting to discuss AI in relation to Solid. Please contact me if you’re interested in presenting a short demo or description of your work at the meeting.

There is a fork of LibreChat on GitHub://solid/LibreChat.
Can I presume that is intended to use Solid pods to host conversations?

-alec

Yes, I believe so.

As others have mentioned, ODI is working on LibreChat for using Solid as storage backend. That should be a promising thing.

In addition to that, I’d like to also mention our previous research demo in 2024, SocialGenPod (and its demo paper; see later down this page for a short intro video), for combining LLM with Solid – not just storing conversation history, but also performing social interactions/reusing/sharing from that, powered by Solid’s mechanisms, which is impossible in other (centralized) solutions.

Alas, it’s a research demo, and we didn’t continue hosting it due to resource constraints (but I heard that Streamlit provides hosting service now? May investigate that.). But hope its principles can be learned and implemented in any further developments :slight_smile:

1 Like

Is the hosting burden the traffic or the LLM fees? Because if you can guarantee pod security, I’d be happy to store my LLM API keys in it.

Sincerely,
-alec

508-361-2039

The maintenance burden for us and for users are different.

For us, it’s both hosting the app, and maintaining an example LLM service. That’s the only way we can demonstrate the system. LLM service is apparently expensive, as you also pointed out; but the user can specify which service they want to use, which reduces this burden if we don’t need to “demonstrate”.

But the app hosting was also a burden, as that requires a server (it’s written using Streamlit and Python, so can’t be put onto, e.g., GitHub Pages), and involves interesting/annoying hosting rules from our institution. (That’s why I mentioned the possibility of Streamlit providing hosting… which may improve the situation.)

Still a WIP, we are working to get it to a point where it can be upstreamed to the core LibreChat codebase. In the meantime here is a preview: https://drive.google.com/file/d/14BumPFZ58IZb8cGK-mB4GONfafXcm0LS/view?usp=sharing

Would it be possible to use Solid as a space for safely maintaining LLM data?

Absolutely! This has been discussed across various in person gatherings (SoSy and SolidLabs) and video-calls.

There are several different perspectives people are coming from, each with their own solutions.

I think the upcoming Solid Practitioners regarding AI can be a very interesting convergence for all parties involved in this. Both those that already commented here, and others.

From what you describe, I was thinking that another route might be to combine an Activitypub MCP with ActivityPods, and regulate ingress/egress to and from Pods that way. :thinking:

Within Muze (part of PDS Interop, creators of the Solid-Nextcloud and the PHP Solid Server) we’ve been discussing ways a Solid Pod could be used as a central hub for “All your AI activity”.

We’ve got plans of adding ANP (Agent Network Protocol) to the Solid-Nextcloud and PSS. (As others are already working on MCP solutions).

ANP seem a specifically good match for Solid, as it requires Linked-Data (which Solid also demands) and uses Decentralized Identifiers (DIDs) which is also being looked into for the Solid Spec.

I’ve not had time to publish anything, but it might be worth throwing a proof-of-concept together, if I can ever find the time… :sweat_smile:

1 Like

I finished a small prototype that uses a stateless server; all data is stored client side in sessionStorage, but it’s passed over https for every call and uses RAG, so it really output to be housed closer to the LLM running on the server.

Enjoy!

I’ve been building a pair of tools that use Solid Pods as the storage backbone for personal AI memory, and I’d love to share them and get your feedback.


The problem I wanted to solve

Every day I have dozens of conversations across ChatGPT, Claude, and Gemini, plus I highlight articles, watch videos, and visit pages I’ll never find again. All of that context is siloed — locked inside commercial platforms I don’t control, with no way to search across it or ask follow-up questions later. I wanted a system where I own the archive, and no company could take it away or monetise it.

Solid was the obvious answer.


What Synara is

Synara is two things that work together:

1. Synara Extension — a Chrome browser extension (Manifest V3) that acts as a passive memory collector. It watches your AI sessions and browsing and writes structured JSON events to your Solid Pod. It currently captures:

  • Full conversations from ChatGPT, Claude, and Gemini (via DOM adapters with a MutationObserver for auto-capture)

  • Text highlights — anything you select and choose to save

  • YouTube/media — title, creator, platform

  • Page visits and manual “Remember This Page” bookmarks

All capture goes through a privacy filter before hitting the network — it redacts credit card numbers, API keys, tokens, SSNs, and password-field patterns. Sensitive URLs (banking, medical) are blocked from capture entirely.

Events are written to the pod at:
/apps/synara/events/{eventType}/{YYYY}/{MM}/{DD}/{ulid}.json

Each file is keyed by a ULID so they sort chronologically and can be cached indefinitely (immutable once written). The extension uses @inrupt/solid-client and @inrupt/solid-client-authn-browser with a token credentials flow stored in chrome.storage.local, so auth persists across browser restarts without re-login prompts.

There’s also a durable event queue — writes go to chrome.storage.local first, then drain to the pod, with automatic retry for failed writes.

2. Synara AI — a React App(the recall half). Once the extension has been collecting, you open the app(ai.privatedatapod.com) to explore and query everything:

  • Timeline — chronological feed of all captured events

  • Library — filterable, searchable browsable view

  • Ask Synara — natural language queries answered by OpenAI, grounded in your own memories. It ranks your stored events by relevance, shows you a prompt preview before anything is sent (so you can see and remove specific memories), then returns an answer with clickable source citations.

The web app is read-only against the pod — it never writes. It does an IndexedDB-backed delta sync: on first load it traverses the pod container tree and fetches all event files; on subsequent loads it only fetches files it hasn’t seen before (with a 48-hour exception for recent daily files still being actively written by the extension).

The OpenAI API key lives exclusively in an AWS Lambda proxy — it never reaches the browser.


The Solid-first design choices

A few things I tried to get right from a Solid perspective:

  • No Synara servers store any user data. The pod is the only datastore. The web app is a static site (CloudFront + S3); the only server-side component is the Lambda proxy that forwards anonymised text to OpenAI.

  • The pod path convention is designed to be legible and independently usable — another app could read or write the same event files if it followed the schema.

  • Auth in the extension uses token credentials rather than the browser OIDC session, because service workers don’t have access to browser cookies/sessions. Happy to discuss whether there are better approaches here — this felt like the right tradeoff.

  • The ULID + immutable file per event pattern keeps delta sync cheap and avoids any need for SPARQL or server-side indexes.


What I’d love feedback on

  • Is the pod path structure (/apps/synara/events/{type}/{YYYY}/{MM}/{DD}/{ulid}.json) the right shape? I considered putting everything in one container and using metadata for filtering, but the nested date hierarchy made delta sync much simpler.

  • Are there Solid conventions I should be following for app data paths or event schemas that I’ve missed?

  • The extension uses token credentials for pod auth from the service worker — has anyone solved this more elegantly?

  • Any interest in standardising an event schema for browser-captured data so other Solid apps could interoperate?

The extension is in active use and I’m adding adapters for more platforms. Happy to share more detail on any part of the architecture.

Thanks for building such a compelling platform — Solid made this whole thing possible.

The path itself is not so important, but it seems like this app stores data in json files, not RDF. In Solid, there are basically two types of files:

  • RDF files, with semantic data. This is the main way Solid Apps use to store their data.
  • Binary blobs, for everything else (videos, images, etc.).

Of course, as you’ve seen with this application, there’s nothing stopping you from storing json, markdown, or even .sql files. But the chances that other Solid applications are going to understand this is very slim.

I’m curious, how did you come to the conclusion to use plain jsons in the Solid POD? Did you just ask an AI to make “a Solid App” for you and this is what it did? Or did you make a conscious choice to store the data in json format? If it’s the latter, what lead you to making this decision? I’m sure most Solid tutorials and resources that you will find online point you towards using RDF. Or if they haven’t, I’d be curious to see those tutorials.

As I said, the path is not so important. What is important, though, is that you leave a trace so that other apps can find the data created by your app (without knowing any specific container paths). Usually, you can do this with the type index. But since you’re not using RDF, it probably won’t be possible to do that.

I actually wrote a blog post about this topic a while ago, check it out if you want to learn more: Interoperable Serendipity.

By the way, I have started working on a project that does just that, use an LLM to talk to your Solid POD :).

It’s just a prototype at this point, but you can already “talk to your POD”:

Check it out here: https://anima.noeldemartin.com/

The JSON choice was deliberate, though your post is pushing me to revisit the cost of it.

The reasoning at the time: Synara uses an append-only, immutable event model — each file is a typed event envelope (ai_chat.captured, page.visit, highlight.created, etc.) with a ULID, an integrity hash, a consent block, and a strongly-typed payload. The schema is richer and more domain-specific than I expected to express naturally in Turtle, and the nested-JSON payload (e.g. an array of chat messages with roles and sequence numbers) felt awkward to flatten into triples.

The date hierarchy in the path was also intentional: delta sync becomes O(days since last sync) rather than O(total files), which matters when someone has years of browsing history captured.

But you’re right about the interoperability cost, and I don’t have a good answer to the type index gap.

I think a good middle ground would be to use real RDF for the basic data (such as memory title, url, etc.), and have a json “field” (basically, a string field that is a serialized JSON) for more complex data that only your app understands. Of course, it would be ideal if everything is RDF, but I understand that’s not so easy as it sounds (I’m also storing some “json strings” in my app, though I’m still considering if that will be the final solution).

This would allow you to use the type index, and make the data possibly usable by other apps. Not all the data needs to be reusable everywhere, but if only some of it is understandable by other apps, that’s already a win :). Don’t be afraid to create your own vocabularies for edge cases that aren’t covered by existing ontologies. See Tim’s Bag of Chips talk for more on this idea.

On that note, there are many Solid Apps out there doing bookmarks… Maybe you could explore how some of them work, and use the same base vocabularies for your memories.

Yeah, this part is tricky… Some implementations in Solid already work like that, for example chats in SolidOS.

Personally I don’t think it’s ideal, because the idea of Solid is that apps can just “follow their nose” to find everything. But the reality is there are performance issues, as you mention.

If you keep that folder hierarchy but still register the root container in the Type Index, that would be a decent solution in my opinion.

With respect to the following question, I suggest nesting type inside of date, since “date” is immutable and the “type” of a thing might change (although I don’t know exactly what “type” refers to in this context).

Sincerely,
-alec