Hello all. I’ve been building a personal AI assistant for individuals and households on Solid foundations, for about a year, in the evenings and on weekends. It runs daily on a small, second-hand server in my house. It’s at a stage where I can share it here and be told where I’ve got it wrong.
A note on who is writing this: I’m a technology portfolio director by profession, and am not a career engineer. I can reason about architecture and I make the design decisions. My own programming background is Python, specialised in EdTech, and the first version of this system was Python talking to Ollama. The current system is TypeScript, which turned out to be the better fit once a web client and cloud model integration entered the picture, but I am not a strong TypeScript programmer, and a large share of that code was written with AI coding assistants under my direction and design decisions. I mention it because a serious attempt around customer-facing Solid app of this scale I feel needs the output of a team, and that constraint has shifted somewhat with harness engineering. With enough guidance and guardrails, these tools put a team-sized attempt within reach of one person. It also means I would rather you check the claims below than take them on faith.
What’s implemented (current main, unit-tested)
- A Solid-OIDC identity provider (Panva
oidc-providerv8):webidclaim and scope, PKCE required, DPoP enabled, dynamic client registration, discovery and JWKS. Each person has a WebID at/people/<slug>#me, content-negotiated as Turtle or JSON-LD, declaringsolid:oidcIssuer, storage, andsolid:publicTypeIndex. - An LDP read/write surface for the user’s data, internally called “Spaces”, at
/spaces/<category>/<slug>: GET, HEAD, PUT and DELETE, LDP containers, a type index, and content negotiation across markdown, Turtle and JSON-LD. - A DPoP verifier implementing the full RFC 9449 binding (
htm,htu,iat,jti,ath,jkt). A request carrying a DPoP-bound token is rejected without a valid proof. - Code-level, per-resource, default-deny authorisation, derived from a single visibility source of truth that the application’s own access logic also uses.
Roughly 117 passing unit tests across the OIDC and Solid modules.
Where it’s honestly rough, and where I’d value this group
- Interop testing has only just begun. Until this week everything round-tripped my own serialiser against my own parser. The first handshake against a Community Solid Server instance immediately found a real bug: I PUT Turtle, CSS returned it as expanded JSON-LD (a top-level array, no
@graph), and my reader threw on a shape my own server never emits. Fixed, with the CSS response captured as a test fixture; the harness isscripts/handshake-css.tsin the repo. From the same pass: CSS v7 appears not to implement conditional GET, returning 200 even for a byte-exactIf-None-Match, so my client falls back to a full re-fetch. I have not tested against NSS, PodSpaces, or the Inrupt pod, and have not run a conformance suite. A pointer to the conformance harness you would aim a newcomer at is the single most useful reply to this post. - Local-first deployment collides with WebID resolution, and I have sidestepped it rather than solved it. Wim’s recent MySolido thread surfaced the same tension: a WebID has to be resolvable by the apps and issuers that consume it, which a pod on someone’s own machine cannot offer without a bridge or a tunnel. Mine runs behind a private overlay network, which works for my own clients but not for an arbitrary Solid app dereferencing a WebID. If the community is converging on bridges, tunnels, or a protocol-level fix, I would like to know which.
- DPoP is enforced when the token is bound, but plain Bearer tokens (no
cnf.jkt) are still accepted, and there is no replay cache or nonce yet. - No PATCH (N3 Patch or SPARQL Update). Clients re-PUT the whole resource.
- My external-pod reader parses the shapes my own server emits. Against an arbitrary pod it falls back to defaults. “Read any Solid pod” is aspirational rather than done.
- Per-person data at rest is not yet detachable. Identity is genuinely per-person, in that everyone has a real WebID, and tenant isolation is enforced at the database kernel, but within a single deployment a member’s Spaces still live in shared storage rather than on a drive that person could physically unplug and take with them. The per-person drive layer is designed but on roadmap. If you have solved portable per-person storage under one roof, I want to learn how you did it.
The part I’d offer back, having built at the consumer end
“Own your data” does not move an ordinary person. It is an abstraction about a harm mostly cannot see or some don’t even worry, and it asks them to want more administration in a life that already has too much. The harm people actually feel is estrangement from their own lives, which sit fragmented across platforms that hold the pieces and monetise the seams. Erich Fromm’s distinction between having and being has been a useful design lens: the surveillance model treats a life as something to be had, and had by someone else at that. What a person wants is for their information to serve the business of being: their relationships, their several roles, the “we” rather than the “they”. I named the product for that idea, మేము, Telugu for “we” – my mother tongue.
In my own life the concrete shape of this is that one person runs several worlds that must not bleed into each other: a job, a side project, a household, a caregiving role. Each of us is the manual integration layer between our own contexts. As an immigrant I find the worlds sit further apart, across two countries and two languages, so every switch costs more. And the load peaks exactly when a person moves from one collective to another: a switching of country/role, a child grown and gone, a death in the family. The current model fails hardest there, because your context does not travel with you, and you arrive in the new world stripped of the old one.
That last case is the one I believe Solid is uniquely positioned to answer, and it shaped the philosophy into code/design I would offer back to this community:
- The individual is the unit of identity. Each person has their own WebID and their own access policies, rather than existing as a row inside someone else’s account.
- Bounded contexts are isolated below the application layer. In my implementation that means Postgres row-level security with
FORCE, a non-superuser application role, and per-collective scoping, so that a boundary between worlds holds even when the application code above it has a bug. There are 30 database-backed isolation tests running against a live database, and this is the part of the system I am most confident in. - The group is emergent rather than primary. A household is a collective that individuals are members of, rather than a shared account that everyone’s data dissolves into. Members share specific resources into the group by reference, revocably, with a grace-period leave; the household keeps a cache for availability but never becomes the owner of a copy. Someone can carry their personal node out of one home and into another with their context intact. That portability of the self across a rupture is the reason for the whole thing.
- What gets shared is editable understanding rather than raw data. The resources here are compiled, human-editable notes: the system drafts them from what it observes, and the person corrects them. Data you merely hold is inert; understanding you can read and edit accrues. In daily family use this has mattered more than any storage detail.
That inversion is the part I think is reusable beyond my own project, and I would welcome being told where it is naive.
A note on the current moment
Sir Tim’s Charlie, which collects your information into a vault, routes the right pieces to the best model, and disguises personal details before they reach the model, is near enough what I have been building toward at the individual end. I have a working consumer data point for it, including a transmission-boundary anonymiser that substitutes real names with stable placeholders, deterministically and reversibly, before any text reaches a cloud LLM model. Images and external tool queries are a known gap, and it is a transmission control rather than encryption at rest. I would value comparing notes with anyone working on the consumer surface of this.
Close
The code is AGPL-3.0 at GitHub - kanchanepally/memu: A private AI assistant for a household. Self-hosted, Solid-based identity, and the cloud AI never learns who you are. · GitHub , including the CSS handshake script, the DPoP verifier, and the isolation tests, so every claim above is checkable. If any of it is wrong or overstated, I am hugely grateful for your pointers.