Why Backend-for-Frontend for Solid is categorically wrong

Am I correct that (as I think you suggest in the final paragraph) that this is a limitation in Solid overall (as currently discussed), rather than just in Inrupt’s implementation? Again I’m new here, but as I dig into the Solid Ecosystem via the docs, github, forum, etc, I don’t think I’m seeing a technical guarantee built in that data isn’t exfiltrated. The “guarantee” is that the business has to ask explicitly for permission, rather than using a lengthy ToS?

I say “limitation” rather than flaw b/c different levels of security guarantees for different use cases seems potentially fine to me. Architecturally, I can see saying something like “there should be a base level Solid protocol/design/etc that supports diverse use cases, but that supports additional levels of security built on as makes sense”.

(Also maybe I’m misunderstanding the goals of Solid? I do see at least some discussion here that interoperability, i.e. the decoupling of app and data, is more primary than the data control piece).

Has anything been planned, discussed, etc, in which Pods would integrate with Confidential Computing solutions like TEEs, restricted browser worklets, etc, even if not as part of “core Solid”? This would seem to open up a class of use cases requiring both intense compute and strong privacy guarantees, if a business could provide the compute resources in a sandbox (organization hosted pod or server fetching data) with the user’s pod only releasing data to an attested server.

So the better solution for now is to do all the data access, or as much of it as possible, directly from the client, such that most of the compute is done in the browser locally, and results cached in the pod.

The “Backend For Frontend” pattern only encourages potential misuse of data, and gives another API surface area that must be appropriately secured and protected, along with all the trust that must be given to a remote entity that they’re not exfiltrating your data for undocumented purposes.

That’s why I’ve been suggesting for a long time that for Solid to really deliver on it’s promises, it will need a storage-local compute option.

If you’ve applications that are using their own custom backends which just store data in the solid pod, you’re not likely to make true on those data interoperability desires.

it will need a storage-local compute option.

That would open up a lot. Do you see that “storage local” processing having any restrictions on disk/network access?

If you’ve applications that are using their own custom backends which just store data in the solid pod, you’re not likely to make true on those data interoperability desires.

Do you see any room at all for applications that won’t be able to run on a users local machine (BFF or even just having a regular backend), or are those just inherently not in the use case list for a Solid App?

Yes, I think this should have very strict ingress and egress controls, as well as controls on resource access. Think similar to the Deno or Node.js application policy files.

I think anything that takes your pod and interacts with it as you on a remote server fundamentally will have the trust model of Solid broken. You can certainly have a server-side remote agent interact with your pod as that application, and that’ll come with needs to disclose what that agent does & we’ll need to rely on legal solutions rather than technical measures to protect users’ data, but most of the applications we’re seeing built now can directly interact with the pod, and it should be your pod.

Sure, pod providers may institute policies to allow moderation or trust & safety (e.g., scanning for CSAM such that the pod provider isn’t publishing/distributing that content), or auditing of pods in order to ensure data integrity.

This is an interesting thread on Inrupt’s Backend-for-Frontend (BFF) blog posting and I’ve learned a bit from the dialog (and I’m still digesting some of the finer points).

In reading the BFF blog I find is useful to consider the source: a company providing enterprise-grade software tools. I believe the intended audience is corporate technology and purchasing decision makers. The article’s bullet points appear to be targeted toward enterprise integration pain-points.

“…see the need for this pattern in large enterprises, especially those with existing infrastructure that are creating a managed ecosystem of applications and Pods. This pattern helps these organizations deploy Solid in a way that better fits their existing technology governance and security policies, and it allows for simpler integrations.”

Please also consider, for most enterprises, end-users don’t own the data. The enterprise owns the data and more specifically, the data is dispersed in line-of-business applications that are controlled and managed by the different business groups in the organization.

In the Inrupt BFF scheme, I see Solid client apps and Pods playing a vital role in facilitating systems integration with a uniform approach to data integrity, data security, and control of data flow built using technology based on a standard protocol. That’s a pretty big deal!

“…We also see the need more generally for server-side agents to have access to Pods to perform calculations on the users behalf, including in the emerging realm of machine learning for personal AI assistants. It is not expected (nor most times even feasible) to execute massive computations in resource-constrained environments like a web browser. In such cases, processes that run algorithms on data from Pods, potentially many Pods, are better done in the backend, reserving the frontend for data visualization and decision making.”

In the enterprise, the trend I have observed is to eliminate storage of business data on end-user devices (i.e., minimize threat exposure and loss of intellectual property by reducing the attack surface). Line-of-business app computing may still be done on end-user devices, but these tend to be hold-over users or in organizations that have considerable lag in upgrading computing infrastructure.

In the United States, the article is on-point that government institutions:

“…need to cater for an extremely wide array of devices, both new and very old, to ensure they meet their commitment to provide services to all citizens”

Indeed, government internal infrastructure may also be quite old.

Years ago, I participated in a group that was developing a tool for submitting data to a regulatory agency using XML. When the tool was presented to the business stakeholders it was rejected due to the complexity and the cost of adaptation. When I first started attending the CG meetings I wondered if Solid would have the same fate.

Inrupt’s positioning of its’ tech to enterprises looks like a smart business strategy and, if successful, should be good for Solid and the community since increased adaptation will set Solids’ place in the web and create more opportunities for those with skills in Solid.

Right, but if your app only talks to your specific hardcode BFF API and your BFF API does who knows what to my pod, but really it’s your pod because I don’t own; What’s even the point of Solid? It’s just a glorified blob storage service at that point.

If I can access my pod, and tell you to store my data on my pod, then that’s the Solid vision. If a company then says “we want to run XYZ on your data” and I can say “that’ll cost you this much per CPU hour to run on my server” — the company literally pays to run their software on my data, whilst my data never leaves my pod providers’ infrastructure (whether that’s a big corporate provider or my own self-hosted pod server).

Edit: More to say, RDF and JSON-LD and Triple Store databases have been around for decades, what is Solid giving that Triple Store isn’t in the context of Enterprise? User permissions somehow kinda but my BFF what’s full access to all the data anyway… so… oops.

2 Likes

Thank you @ThisIsMissEm for your response. This dialog is helping me to start to think of how Solid can best be used.

As a product of my own experience, I’m thinking about this in terms of enterprises that fall under the regulatory regime of Good Manufacturing Practices (GMP) [1] or other good practices (GxP) [2]. I believe the important distinction is these organizations are governed by Quality Management Systems (QMS) [3]. GxP Use Cases are very different from social media platforms or other consumer orientated web applications that provide a service but also host user data (e.g., creative works) [4], and/or aggregate user data for the purpose of selling and/or sharing to third parties.

In the Use Cases I’m thinking about, the organization has specified, tested, and verified what the BFF API has access to and what operations it can perform on what data. The BFF absolutely does not need, nor should it get, full access to pod data.

Inrupt wrote: “The demands of large enterprises can be more readily satisfied by employing the BFF pattern, because it preserves the necessary controls and integrations with existing systems, and separates governance, security and audit concerns from the user interfaces that use the data.”

I think this statement is true for the Use Cases I’m thinking of as it provides a pathway to adapt Solid while minimizing disruption to users and the business. Users carry out their work assignments as per their existing Standard Operating Procedures (SOPs) (hence no need to retrain) while the Solid protocol is implemented across the organization (which may take years).

1 Like

First of all, I generally agree with OP’s point. We have explored several cases (where social of SoLiD being a main driver) which all require some form of "service"s or long-living automated agents performing actions (KNoodle+Orchestrator for calendar-sharing and meeting-scheduling, and Libertas for collective privacy-preserving computation). At the moment, the best we can do is to allow users to choose the provider of these agents. But that is not ideal, both for practical reasons and privacy/security reasons.

Having said that, I don’t feel we should go straight to the conclusion that Solid must have a storage-local compute mechanism.
What we can conclude is, in fact, Solid needs a mechanism to allow custom computation to be performed in a user-trusted (thus user-specified) location, being it the Pod, or being it somewhere else.

In this model, storage-local compute is one (yet most appreciated) option to realise that.

Why is this difference important?

Because if we require every Solid service to provide storage-local compute, two resource issues directly emerge:

  1. How much computation power should be provided?
    1. For running Solid servers, how much computation power should be provided to each user? Should/Can they pay for it?
    2. For less-resourced servers, admins will be frightened to launch a Solid service, thus accelerating centralization.
  2. What if the computation power on a Solid server is not enough for my task?
    1. Can I use a different trusted server (e.g. my own other server) to do the computation?
    2. Or must I migrate all my data to a different Pod?

That is why I consider storage-local computation being one option to do that (especially for less-intensive tasks), but other options shall be supported as well. This gives flexibility to everyone (storage provider, computation provider, and data owner).
Maybe a user-centric business model can emerge from that, who knows.

Of course, standard protocols must be defined to achieve this, which is a separate issue.

This partially reflects the different between Noel’s proposal #390 vs my proposal #393. Though, that proposal does not entirely cover my current thinking of what such a flexible user-defined mechanism should contain, especially the possibility to extend service endpoints under the Pod’s namespace. Will find a time to detail that.

In addition to that, the storage-local model has an inherited bottleneck: it is not compatible to social interactions.
For example, if Alice wants to use Bob’s and Charlie’s data to do a computation, where should the task run?

This is actually a bottleneck we see from existing other PDS solutions focusing on real “personal” storage (e.g. Databox, which has local-computation and ingress and egress controls), in the Libertas work. Solid has a “social” in its name, and it should not give it up.

6 Likes