Would it be possible to use Solid as a space for safely maintaining LLM data?
Right now LLMs are using our data (already 1 billion people, I read somewhere) to train its LLMs. This raises privacy concerns (so I’ve posted to r/EFF about preventing commercial use) and interesting opportunities (such as letting these conversations be used by an open-source AI such as Aperture to produce a public good similar to Wikipedia). Solid seems poised as a good technology to maintain that data (I.e. the conversations that we have with AIs, whether they are commercial or not).
I also think that a truthful/rational AI could be trained online as it would be resistant to capture, but that is an added bonus: the use of solid to store chats and the optional lease of that data to train LLMs are the main proposal that I wish this community to consider.
@timbl wrote this article a while ago where he describes his vision for how AIs can work with Solid: Charlie works for Bob.
I also really like that idea, but so far I’m not aware of anyone who’s actually working on this with Solid.
Inrupt have mentioned that they have a working version Charlie, but as we discussed in the Inrupt's Data Wallet thread, I don’t think Inrupt wallets are actually Solid. Others may disagree, of course. In which case, I guess you could say that Inrupt is already doing what you’re talking about. But again, I’ve only read that blog post and heard them talk about it, I haven’t seen how that actually works.
There are a number of people working on LLM/Solid interactions. ODI (the Open Data Institute) has at least two AI related projects - one to develop a Solid based MCP and another to use AI as a content-moderation tool for solidcommunity.net. We at Solid Practitioners ( GitHub - solid-contrib/practitioners: A hub for Solid developers ) are planning an upcoming virtual meeting to discuss AI in relation to Solid. Please contact me if you’re interested in presenting a short demo or description of your work at the meeting.
As others have mentioned, ODI is working on LibreChat for using Solid as storage backend. That should be a promising thing.
In addition to that, I’d like to also mention our previous research demo in 2024, SocialGenPod (and its demo paper; see later down this page for a short intro video), for combining LLM with Solid – not just storing conversation history, but also performing social interactions/reusing/sharing from that, powered by Solid’s mechanisms, which is impossible in other (centralized) solutions.
Alas, it’s a research demo, and we didn’t continue hosting it due to resource constraints (but I heard that Streamlit provides hosting service now? May investigate that.). But hope its principles can be learned and implemented in any further developments
The maintenance burden for us and for users are different.
For us, it’s both hosting the app, and maintaining an example LLM service. That’s the only way we can demonstrate the system. LLM service is apparently expensive, as you also pointed out; but the user can specify which service they want to use, which reduces this burden if we don’t need to “demonstrate”.
But the app hosting was also a burden, as that requires a server (it’s written using Streamlit and Python, so can’t be put onto, e.g., GitHub Pages), and involves interesting/annoying hosting rules from our institution. (That’s why I mentioned the possibility of Streamlit providing hosting… which may improve the situation.)
Would it be possible to use Solid as a space for safely maintaining LLM data?
Absolutely! This has been discussed across various in person gatherings (SoSy and SolidLabs) and video-calls.
There are several different perspectives people are coming from, each with their own solutions.
I think the upcoming Solid Practitioners regarding AI can be a very interesting convergence for all parties involved in this. Both those that already commented here, and others.
From what you describe, I was thinking that another route might be to combine an Activitypub MCP with ActivityPods, and regulate ingress/egress to and from Pods that way.
Within Muze (part of PDS Interop, creators of the Solid-Nextcloud and the PHP Solid Server) we’ve been discussing ways a Solid Pod could be used as a central hub for “All your AI activity”.
We’ve got plans of adding ANP (Agent Network Protocol) to the Solid-Nextcloud and PSS. (As others are already working on MCP solutions).
ANP seem a specifically good match for Solid, as it requires Linked-Data (which Solid also demands) and uses Decentralized Identifiers (DIDs) which is also being looked into for the Solid Spec.
I’ve not had time to publish anything, but it might be worth throwing a proof-of-concept together, if I can ever find the time…