A consumer personal-AI built on Solid: current status and a couple of questions around interop

Hello all. I’ve been building a personal AI assistant for individuals and households on Solid foundations, for about a year, in the evenings and on weekends. It runs daily on a small, second-hand server in my house. It’s at a stage where I can share it here and be told where I’ve got it wrong.

A note on who is writing this: I’m a technology portfolio director by profession, and am not a career engineer. I can reason about architecture and I make the design decisions. My own programming background is Python, specialised in EdTech, and the first version of this system was Python talking to Ollama. The current system is TypeScript, which turned out to be the better fit once a web client and cloud model integration entered the picture, but I am not a strong TypeScript programmer, and a large share of that code was written with AI coding assistants under my direction and design decisions. I mention it because a serious attempt around customer-facing Solid app of this scale I feel needs the output of a team, and that constraint has shifted somewhat with harness engineering. With enough guidance and guardrails, these tools put a team-sized attempt within reach of one person. It also means I would rather you check the claims below than take them on faith.

What’s implemented (current main, unit-tested)

  • A Solid-OIDC identity provider (Panva oidc-provider v8): webid claim and scope, PKCE required, DPoP enabled, dynamic client registration, discovery and JWKS. Each person has a WebID at /people/<slug>#me, content-negotiated as Turtle or JSON-LD, declaring solid:oidcIssuer, storage, and solid:publicTypeIndex.
  • An LDP read/write surface for the user’s data, internally called “Spaces”, at /spaces/<category>/<slug>: GET, HEAD, PUT and DELETE, LDP containers, a type index, and content negotiation across markdown, Turtle and JSON-LD.
  • A DPoP verifier implementing the full RFC 9449 binding (htm, htu, iat, jti, ath, jkt). A request carrying a DPoP-bound token is rejected without a valid proof.
  • Code-level, per-resource, default-deny authorisation, derived from a single visibility source of truth that the application’s own access logic also uses.

Roughly 117 passing unit tests across the OIDC and Solid modules.

Where it’s honestly rough, and where I’d value this group

  • Interop testing has only just begun. Until this week everything round-tripped my own serialiser against my own parser. The first handshake against a Community Solid Server instance immediately found a real bug: I PUT Turtle, CSS returned it as expanded JSON-LD (a top-level array, no @graph), and my reader threw on a shape my own server never emits. Fixed, with the CSS response captured as a test fixture; the harness is scripts/handshake-css.ts in the repo. From the same pass: CSS v7 appears not to implement conditional GET, returning 200 even for a byte-exact If-None-Match, so my client falls back to a full re-fetch. I have not tested against NSS, PodSpaces, or the Inrupt pod, and have not run a conformance suite. A pointer to the conformance harness you would aim a newcomer at is the single most useful reply to this post.
  • Local-first deployment collides with WebID resolution, and I have sidestepped it rather than solved it. Wim’s recent MySolido thread surfaced the same tension: a WebID has to be resolvable by the apps and issuers that consume it, which a pod on someone’s own machine cannot offer without a bridge or a tunnel. Mine runs behind a private overlay network, which works for my own clients but not for an arbitrary Solid app dereferencing a WebID. If the community is converging on bridges, tunnels, or a protocol-level fix, I would like to know which.
  • DPoP is enforced when the token is bound, but plain Bearer tokens (no cnf.jkt) are still accepted, and there is no replay cache or nonce yet.
  • No PATCH (N3 Patch or SPARQL Update). Clients re-PUT the whole resource.
  • My external-pod reader parses the shapes my own server emits. Against an arbitrary pod it falls back to defaults. “Read any Solid pod” is aspirational rather than done.
  • Per-person data at rest is not yet detachable. Identity is genuinely per-person, in that everyone has a real WebID, and tenant isolation is enforced at the database kernel, but within a single deployment a member’s Spaces still live in shared storage rather than on a drive that person could physically unplug and take with them. The per-person drive layer is designed but on roadmap. If you have solved portable per-person storage under one roof, I want to learn how you did it.

The part I’d offer back, having built at the consumer end

“Own your data” does not move an ordinary person. It is an abstraction about a harm mostly cannot see or some don’t even worry, and it asks them to want more administration in a life that already has too much. The harm people actually feel is estrangement from their own lives, which sit fragmented across platforms that hold the pieces and monetise the seams. Erich Fromm’s distinction between having and being has been a useful design lens: the surveillance model treats a life as something to be had, and had by someone else at that. What a person wants is for their information to serve the business of being: their relationships, their several roles, the “we” rather than the “they”. I named the product for that idea, మేము, Telugu for “we” – my mother tongue.

In my own life the concrete shape of this is that one person runs several worlds that must not bleed into each other: a job, a side project, a household, a caregiving role. Each of us is the manual integration layer between our own contexts. As an immigrant I find the worlds sit further apart, across two countries and two languages, so every switch costs more. And the load peaks exactly when a person moves from one collective to another: a switching of country/role, a child grown and gone, a death in the family. The current model fails hardest there, because your context does not travel with you, and you arrive in the new world stripped of the old one.

That last case is the one I believe Solid is uniquely positioned to answer, and it shaped the philosophy into code/design I would offer back to this community:

  • The individual is the unit of identity. Each person has their own WebID and their own access policies, rather than existing as a row inside someone else’s account.
  • Bounded contexts are isolated below the application layer. In my implementation that means Postgres row-level security with FORCE, a non-superuser application role, and per-collective scoping, so that a boundary between worlds holds even when the application code above it has a bug. There are 30 database-backed isolation tests running against a live database, and this is the part of the system I am most confident in.
  • The group is emergent rather than primary. A household is a collective that individuals are members of, rather than a shared account that everyone’s data dissolves into. Members share specific resources into the group by reference, revocably, with a grace-period leave; the household keeps a cache for availability but never becomes the owner of a copy. Someone can carry their personal node out of one home and into another with their context intact. That portability of the self across a rupture is the reason for the whole thing.
  • What gets shared is editable understanding rather than raw data. The resources here are compiled, human-editable notes: the system drafts them from what it observes, and the person corrects them. Data you merely hold is inert; understanding you can read and edit accrues. In daily family use this has mattered more than any storage detail.

That inversion is the part I think is reusable beyond my own project, and I would welcome being told where it is naive.

A note on the current moment

Sir Tim’s Charlie, which collects your information into a vault, routes the right pieces to the best model, and disguises personal details before they reach the model, is near enough what I have been building toward at the individual end. I have a working consumer data point for it, including a transmission-boundary anonymiser that substitutes real names with stable placeholders, deterministically and reversibly, before any text reaches a cloud LLM model. Images and external tool queries are a known gap, and it is a transmission control rather than encryption at rest. I would value comparing notes with anyone working on the consumer surface of this.

Close

The code is AGPL-3.0 at GitHub - kanchanepally/memu: A private AI assistant for a household. Self-hosted, Solid-based identity, and the cloud AI never learns who you are. · GitHub , including the CSS handshake script, the DPoP verifier, and the isolation tests, so every claim above is checkable. If any of it is wrong or overstated, I am hugely grateful for your pointers.

Great write-up. A few things that might be directly useful.

On the conformance harness question: Honestly, there is no single “point this at your server and get a score” tool that is well-maintained and newcomer-friendly right now. The most practical substitute is to test against multiple real pod servers and treat divergence as a signal. On that front, privatedatapod.com would be a useful target for your handshake scripts; pods are at {username}.privatedatapod.com, OIDC issuer is the root domain, and we run CSS v7.1.9 in subdomain mode. Your scripts/handshake-css.ts approach of capturing real server responses as test fixtures is probably the right methodology anyway, since it tests the servers that actually exist rather than an idealized spec.

On the CSS conditional GET behavior you observed: We run CSS v7 in production and have hit the same thing. The If-None-Match round-trip does not behave as the spec requires in all cases. A full re-fetch fallback is the right defensive move. Worth filing against the CSS repo if it hasn’t been already.

On encrypted storage at rest: You mentioned this is on the roadmap. We shipped a client-side encrypted storage layer for Solid pods called the Vault SDK that might save you some design work, or at least be a useful reference. It uses AES-256-GCM with PBKDF2 key derivation (600k iterations), per-app key isolation via HKDF, and falls back gracefully to plaintext on pods that have not initialized a keystore. Zero dependencies, pure Web Crypto API. The key never leaves the client, which complements the transmission-boundary anonymizer you described rather than replacing it. One caveat: the encrypted mode currently requires a privatedatapod.com pod; plaintext mode works on any Solid server.

On your local-first WebID resolution problem: We have not solved this at the protocol level either. The practical answer for consumer-facing apps right now is to give users a hosted pod with a stable public WebID and let the AI assistant run wherever it runs. The pod is the identity anchor and the data store; the AI is a client with delegated access. That decoupling also resolves the WebID resolution tension without needing a tunnel.

On the broader philosophy: The framing around context portability across life transitions is the most useful description I have read of why Solid matters to a person who is not already sold on data sovereignty as an abstract principle. That is worth writing up separately.

1 Like

Thank you, this is exactly the kind of reply I hoped for.

On conformance, “test against multiple real servers and treat divergence as a signal” is a more honest method than I expected the answer to be,but it matches what the first handshake taught me: the bug I found was not a spec misreading, it was a shape a real server emits that no amount of spec reading had prepared me for. I will point scripts/handshake-css.ts at a privatedatapod.com pod as the second target and add the responses to the fixture set. Subdomain mode is a useful contrast in its own right, since my deployment is path-based.

On If-None-Match, agreed, and I will search the CSS issue tracker and file it with a minimal reproduction if nobody has beaten me to it.

On the Vault SDK, thank you, and I will read it before designing anything of my own, since the cipher choices are close to what I had pencilled in. One question in the spirit of the thread, though. Encrypted mode requiring a privatedatapod.com pod is the kind of dependency I have learned to poke at. What does the pod need to provide beyond a standard CSS v7 surface, and is it a keystore convention that could in principle be published so that any pod could satisfy it? I ask because the property I care most about is that a person can relocate without loss, and an
encryption layer that only decrypts on one provider would cut against that however good the cryptography is. If it is a convention rather than a service, I would gladly implement the other side of it.

On WebID resolution, the decoupling you describe, with the pod as the identity anchor and the assistant as a client holding delegated access, is roughly where I am converging for the hosted tier of this product, so it is useful to hear it stated as the practical consensus. My residual worry
is that it relocates the dependency to the most sensitive spot: if the hosted pod is the anchor, then leaving the provider breaks identity unless the WebID lives on a domain the person controls or can alias later. Do you support custom-domain WebIDs? That feels like the missing half of the
answer.

And on the last point, thank you. I am in the middle of writing exactly that piece for a small newsletter I run, and your sentence is a useful sign the framing carries beyond my own head. I will post a link here when it is published.

Two good questions, and I owe you a correction on the first one.

On the Vault SDK: I overstated the platform dependency in my previous reply. The keystore is a plain JSON document stored at /vault/.keystore on the pod via a standard LDP PUT. Decryption happens entirely in the browser using the Web Crypto API. The format is:

{
  "version": 1,
  "kdf": "PBKDF2",
  "kdfParams": { "hash": "SHA-256", "iterations": 600000, "salt": "<base64>" },
  "wrappedKey": "<base64 AES-256 vault key wrapped under AES-KW>",
  "recoveryWrappedKey": "<base64, optional>"
}

Any Solid server that supports authenticated LDP reads and writes can host that file. The SDK is MIT-licensed and the source is at github.com/pod42/PDPVault-SDK if you want to read through it. What is specific to privatedatapod.com is not the cryptography but the provisioning UI: first-time vault setup and the delegation grant approval step both currently live on our account page. The delegation flow (passphrase-free access) also publishes ECDH public keys to /vault/.delegation-keys/{namespace}/{thumbprint}.json on the pod, and those files follow the same provider-agnostic convention. If you want to implement a compatible provisioning flow on your own deployment, the source is the full reference and we are happy to answer questions.

On custom-domain WebIDs: You have identified the right gap, and the honest answer is that we do not support it yet. It is on the roadmap. Right now a user’s WebID is tied to {username}.privatedatapod.com, which means the identity travels with the provider, not the person. Your point stands: true portability requires either a user-controlled domain or a clean migration path that does not break existing WebID references. We know that and it is not solved today.

One distinction worth drawing: the data itself is not tied to the provider. Pod contents are stored as standard LDP resources on a CSS instance, in Turtle and JSON-LD, readable and exportable by any conformant Solid client. The portability problem you are describing is an identity problem, not a data format problem. Getting the data out is solved. Getting the WebID to travel with the person is what custom domain support would address, and that is the part that is not done yet.

1 Like