Can I restrict where user's data can go?

It seems that SOLID protects user data from unpleasant access and modifying by service provider, not from data leaking. All SOLID Apps can send user data to everywhere, even if not user’s pod. Users can’t know where their data goes.

It’s little bit horrible. Imagine user who use a Todolist app. User who uses SOLID application may think like this: “All my todo list are saved into my pod, my data is safe and is controlled by me.”. But if the Todolist app have sent all user’s todo list data to their server secretly?

Can I restrict where user’s data can go? Or is there feature to force SOLID application to make connection to only user’s pod?

1 Like

No, there’s no such restriction, as you’re working within the context of a browser. An application could knowingly (or unknowingly) exfiltrate data to a destination that is not the user’s pod. Solid doesn’t, and can’t, protect against that.

Instead, you’ll need to rely on good old terms of service documents, privacy policies & local legal frameworks.

In theory you could also write an extension for your browser that prompts for every outbound request a page/site makes, but that’d likely be overwhelming.

Another potential future possibility might be the usage of Realms where you could sandbox the javascript to only interacting with a finite set of domains, but as these are setup by javascript, they’re kinda useless in the context of first-party code doing something malicious.

4 Likes

Solid doesn’t, and can’t, protect against that.

Was just curious, is there any possible way in theory that we can protect against apps from duplicating our data on their own servers? From my understanding, apps can make requests to our Pod from a non-browser client, so they can set up automatic programs to frequently duplicate our Pod’s data assuming they have read permissions. So that kind of creates the same problem we have today, we can’t control what the apps will do with our data once its on their servers. I guess we must rely on privacy policies, but just wondering if theoretically there is a technical solution to this. I thought of using encryption but apps may need to do data processing on decrypted data on their own servers - I came across “homomorphic encryption” and “confidential computing” which try to secure data in use, but these are both in early stages of research & development.

Also, was thinking that apps can still generate new data points about the user while they’re using their apps and store it in their own servers without ever going through Pods. For example, Facebook can still track user activity on their mobile app (clicks, amount of time spent looking at a post, scrolling data, other user actions) and store these data points directly on their own servers. This, along with frequently duplicating user data from Pods, kind of leads to the same problem we have today.

Not sure if these problems are inevitable though, I guess we need to rely on privacy policies and choose to use apps that minimize our data collection if that’s what we want.

1 Like

The basic problem is that once an app can read data there is no technical way to prevent them from doing whatever they want with that data. We can and should write laws against data theft. We can and should develop a web of trust whereby apps that steal data are shunned. We can, with Solid make a robust system where only apps and people we choose can read our data and even that permission can be limited or removed.

2 Likes

One a system or a person knows something, you can’t make them un-know it. It is theoretically impossible. It’s not a software thing it’s a philosophy thing.

Every now and again people ask for some form of magic data which will expire at a certain time, no matter how many times it has been copied, like magic disappearing ink. But that isn’t going to happen. It’s impossible.

So we have to rely on conventions and policies and in the end, the law – and social reputation pressure. We rely on whistleblowers within organizations which break the rules.

What we do have are tools to help organizations to to the right thing. We can help people track what they have promises to do and not do with data. We have languages – ontologies – for describing different types of data use. We have languages to tracking where data came from – its provenance. We can build accountable systems which track what an agent uses data for and checks against what has been allowed or consented to.

5 Likes

I’ve only read about the usage of homomorphic encryption and tried out this example, where the user enters a country name and the server returns the (encrypted) capital city without knowing what the user searched for. The user is able to use the data and computation power of the server, but the server is not able to see the user data.

This should also be doable with solid pods. In the simplest form, you give access to a trusted app which encrypts the data and sends it to a server (e.g. the country-capital program as above) which processes the query and returns the result. In this case you would have one simple, easy-to-verify and trusted application that interacts with an untrusted server application.

In addition to the limitations of homomorphic encryption, I think the limitation in Solid is: some trusted app will need to homomorphically encrypt the data before it is sent to the server that processes the request. Solid pods won’t do this for you, so you will need to trust at least one application.

I don’t know about “confidential computing”, but if it can be used with an API, I guess the same limitation applies and you will need a trusted app as an intermediary.

2 Likes

To put my answer in the context of the other answers:

If the application reads the data, then the application can do with the data whatever it is able to technically do (send it to servers, etc).

Therefore, if you want to use an untrusted application without loosing control of it you can:

  • use an application that operates on encrypted data (e.g. with homomorphic encryption)
  • use an application that runs in a sandbox (e.g. no internet connection except to your solid pod; requires a well thought-through sandbox)
  • maybe something else I’m missing currently :person_shrugging:
3 Likes

I get asked this question basically every time I give a talk about Solid, and I use it as an opportunity to clarify some important concepts.

First, Solid is not just a technology. No problem can ever be fully solved by technology. A successful Solid ecosystem requires the interplay between technical, societal, ethical, legal, economical, and many more aspects. So what you are touching here upon is the technology–legal barrier: the moment an entity has access to data, it ceases to be a technical problem (because the access has happened), and now needs the legal framework.

Second, there are technical protections. With Solid, we’ll restrict what third-parties can see to the bare minimum they need to know. In the future, we’ll also want to send the usage policy around the data, for auditing purposes. And we can always revoke access to updated data.

Third, and perhaps most importantly going forward, we should not look at the future with today’s eyes. Yes, today several companies would harvest and keep as much data as they possibly can. Because that’s how you survive in today’s data rat race. But most companies don’t really want the responsibility of having all of this data, and the legal obligations that come with it. They just want to use the data they need to deliver the service you want, and that’s it.

So, can they technically make a copy of the data? Of course.
Do they really want to have the liability that comes with doing this? I doubt it.

9 Likes

I feel a little bit shy to add to a thread which already has great answers from such web and Solid heavyweights as this does but the question reminded me of something that has occurred to me in the past. Ruben’s comment about not looking at the future with today’s eyes also encouraged me to post.

One could imagine, a privacy-focused browser of the future that, as well as providing the user the opportunity to allow or block use of the camera or microphone, or to open pop-ups, on an origin-by-origin basis, would also provide a way to control which origins the site in question could make XMLHttpRequests to. Because of the architecture of Solid, I think it would be feasible for a user to choose to block XMLHttpRequests to the site itself but allow them to the user’s Solid pod thereby ensuring that the data remains on the user’s pod hosting server and in the user’s client. For a simple todo app, the logic of which can be easily implemented entirely in the client, the user could then be confident that their data was not being leaked.

There would obviously lots to consider on the societal and economic aspects of this (how could browser maintainers be motivated to implement this and how could the publishers of apps that might have to operate in this mode be rewarded for their work) but in the short term, I’d be interested if others think there is any mileage in this and also to know if it answers the OP’s question at all?

Thanks for reading.

3 Likes

I haven’t looked into homomorphic encryption too much, but I’m wondering, in your example, would the server be able to deduce which country the user searched for given the city? Eg. by mapping encrypted cities → encrypted countries → decrypted countries. But I briefly looked over the Github page and I don’t know too much about this, was just wondering if it’d be possible for servers to deduce the user input given the output.

Confidential computing seems to be an industry-led initiative with a hardware solution to keep data encrypted in-use. I think the idea is that data is kept encrypted in memory, and only decrypted when, I’m assuming loaded onto CPU registers. I think there is a digital circuit to decrypt/encrypt this data when it moves between CPU and memory (can’t find the source for this explanation…). Very interesting approach! Not sure if it can be used with an API, maybe they’re trying to build specialized CPUs/GPUs that support this. Haven’t looked into this too much. (A few sources: What is Confidential Computing? | IBM, https://cloud.google.com/confidential-computing, Azure Confidential Computing – Protect Data In Use | Microsoft Azure)

I wonder if in the future if it may be possible to create a Pod Provider that takes care of homomorphically encrypting the data instead of having to trust a third-party intermediary to do this. But that might be the same as trusting an intermediary - you still need to trust your Pod Provider.

1 Like

I also felt a bit shy and very underqualified to be replying in this thread. I appreciate all the responses, really helped me understand the role of Solid!

I really like Ruben’s comment of not looking at the future with today’s eyes, and the blog post was an interesting read! I totally agree, I think we’ve gotten so accustomed to the data-driven tech economy that it’s hard for us to see other ways. I’m sure in the early stages of the Internet and Web, people didn’t really think that the future of the Web would become so data-centric. So, I think we just wait and see what the future will bring. I’m optimistic that we can move away from data driving the monetizations in tech, maybe.

1 Like

No, the server should know nothing about what the client searched for, and also what the client received as a response. Both are encrypted. Also the intermediary results (ie the 0/1 masks) are encrypted and the server does not know the values of them. But I don’t know how exactly this encryption works, for instance why the server can’t differentiate between all the encrypted 0s and 1 (probably the encryption includes randomness, so encryption(0) != encryption(0), but :person_shrugging: )

1 Like

Very interesting, thank you!!

I recall Zama AI or Google research having some good material on this, explaining how it all works.

Though tbh, it’s definitely still only something new, not mainstream yet. The simpler short-term option is offering compute next to storage with strict permissions

2 Likes