It seems that SOLID protects user data from unpleasant access and modifying by service provider, not from data leaking. All SOLID Apps can send user data to everywhere, even if not user’s pod. Users can’t know where their data goes.
It’s little bit horrible. Imagine user who use a Todolist app. User who uses SOLID application may think like this: “All my todo list are saved into my pod, my data is safe and is controlled by me.”. But if the Todolist app have sent all user’s todo list data to their server secretly?
Can I restrict where user’s data can go? Or is there feature to force SOLID application to make connection to only user’s pod?
No, there’s no such restriction, as you’re working within the context of a browser. An application could knowingly (or unknowingly) exfiltrate data to a destination that is not the user’s pod. Solid doesn’t, and can’t, protect against that.
Instead, you’ll need to rely on good old terms of service documents, privacy policies & local legal frameworks.
In theory you could also write an extension for your browser that prompts for every outbound request a page/site makes, but that’d likely be overwhelming.
Was just curious, is there any possible way in theory that we can protect against apps from duplicating our data on their own servers? From my understanding, apps can make requests to our Pod from a non-browser client, so they can set up automatic programs to frequently duplicate our Pod’s data assuming they have read permissions. So that kind of creates the same problem we have today, we can’t control what the apps will do with our data once its on their servers. I guess we must rely on privacy policies, but just wondering if theoretically there is a technical solution to this. I thought of using encryption but apps may need to do data processing on decrypted data on their own servers - I came across “homomorphic encryption” and “confidential computing” which try to secure data in use, but these are both in early stages of research & development.
Also, was thinking that apps can still generate new data points about the user while they’re using their apps and store it in their own servers without ever going through Pods. For example, Facebook can still track user activity on their mobile app (clicks, amount of time spent looking at a post, scrolling data, other user actions) and store these data points directly on their own servers. This, along with frequently duplicating user data from Pods, kind of leads to the same problem we have today.
Not sure if these problems are inevitable though, I guess we need to rely on privacy policies and choose to use apps that minimize our data collection if that’s what we want.
The basic problem is that once an app can read data there is no technical way to prevent them from doing whatever they want with that data. We can and should write laws against data theft. We can and should develop a web of trust whereby apps that steal data are shunned. We can, with Solid make a robust system where only apps and people we choose can read our data and even that permission can be limited or removed.
One a system or a person knows something, you can’t make them un-know it. It is theoretically impossible. It’s not a software thing it’s a philosophy thing.
Every now and again people ask for some form of magic data which will expire at a certain time, no matter how many times it has been copied, like magic disappearing ink. But that isn’t going to happen. It’s impossible.
So we have to rely on conventions and policies and in the end, the law – and social reputation pressure. We rely on whistleblowers within organizations which break the rules.
What we do have are tools to help organizations to to the right thing. We can help people track what they have promises to do and not do with data. We have languages – ontologies – for describing different types of data use. We have languages to tracking where data came from – its provenance. We can build accountable systems which track what an agent uses data for and checks against what has been allowed or consented to.
I’ve only read about the usage of homomorphic encryption and tried out this example, where the user enters a country name and the server returns the (encrypted) capital city without knowing what the user searched for. The user is able to use the data and computation power of the server, but the server is not able to see the user data.
This should also be doable with solid pods. In the simplest form, you give access to a trusted app which encrypts the data and sends it to a server (e.g. the country-capital program as above) which processes the query and returns the result. In this case you would have one simple, easy-to-verify and trusted application that interacts with an untrusted server application.
In addition to the limitations of homomorphic encryption, I think the limitation in Solid is: some trusted app will need to homomorphically encrypt the data before it is sent to the server that processes the request. Solid pods won’t do this for you, so you will need to trust at least one application.
I don’t know about “confidential computing”, but if it can be used with an API, I guess the same limitation applies and you will need a trusted app as an intermediary.
I get asked this question basically every time I give a talk about Solid, and I use it as an opportunity to clarify some important concepts.
First, Solid is not just a technology. No problem can ever be fully solved by technology. A successful Solid ecosystem requires the interplay between technical, societal, ethical, legal, economical, and many more aspects. So what you are touching here upon is the technology–legal barrier: the moment an entity has access to data, it ceases to be a technical problem (because the access has happened), and now needs the legal framework.
Second, there are technical protections. With Solid, we’ll restrict what third-parties can see to the bare minimum they need to know. In the future, we’ll also want to send the usage policy around the data, for auditing purposes. And we can always revoke access to updated data.
Third, and perhaps most importantly going forward, we should not look at the future with today’s eyes. Yes, today several companies would harvest and keep as much data as they possibly can. Because that’s how you survive in today’s data rat race. But most companies don’t really want the responsibility of having all of this data, and the legal obligations that come with it. They just want to use the data they need to deliver the service you want, and that’s it.
So, can they technically make a copy of the data? Of course.
Do they really want to have the liability that comes with doing this? I doubt it.
I feel a little bit shy to add to a thread which already has great answers from such web and Solid heavyweights as this does but the question reminded me of something that has occurred to me in the past. Ruben’s comment about not looking at the future with today’s eyes also encouraged me to post.
One could imagine, a privacy-focused browser of the future that, as well as providing the user the opportunity to allow or block use of the camera or microphone, or to open pop-ups, on an origin-by-origin basis, would also provide a way to control which origins the site in question could make XMLHttpRequests to. Because of the architecture of Solid, I think it would be feasible for a user to choose to block XMLHttpRequests to the site itself but allow them to the user’s Solid pod thereby ensuring that the data remains on the user’s pod hosting server and in the user’s client. For a simple todo app, the logic of which can be easily implemented entirely in the client, the user could then be confident that their data was not being leaked.
There would obviously lots to consider on the societal and economic aspects of this (how could browser maintainers be motivated to implement this and how could the publishers of apps that might have to operate in this mode be rewarded for their work) but in the short term, I’d be interested if others think there is any mileage in this and also to know if it answers the OP’s question at all?
I haven’t looked into homomorphic encryption too much, but I’m wondering, in your example, would the server be able to deduce which country the user searched for given the city? Eg. by mapping encrypted cities → encrypted countries → decrypted countries. But I briefly looked over the Github page and I don’t know too much about this, was just wondering if it’d be possible for servers to deduce the user input given the output.
I wonder if in the future if it may be possible to create a Pod Provider that takes care of homomorphically encrypting the data instead of having to trust a third-party intermediary to do this. But that might be the same as trusting an intermediary - you still need to trust your Pod Provider.
I also felt a bit shy and very underqualified to be replying in this thread. I appreciate all the responses, really helped me understand the role of Solid!
I really like Ruben’s comment of not looking at the future with today’s eyes, and the blog post was an interesting read! I totally agree, I think we’ve gotten so accustomed to the data-driven tech economy that it’s hard for us to see other ways. I’m sure in the early stages of the Internet and Web, people didn’t really think that the future of the Web would become so data-centric. So, I think we just wait and see what the future will bring. I’m optimistic that we can move away from data driving the monetizations in tech, maybe.
No, the server should know nothing about what the client searched for, and also what the client received as a response. Both are encrypted. Also the intermediary results (ie the 0/1 masks) are encrypted and the server does not know the values of them. But I don’t know how exactly this encryption works, for instance why the server can’t differentiate between all the encrypted 0s and 1 (probably the encryption includes randomness, so encryption(0) != encryption(0), but )
Inrupt.net currently runs NSS, so I think it’d only be using disk-level encryption, and data isn’t encrypted by NSS prior to being written to disk. I might be wrong on this information though, it’s been a while since I did anything on it, @Timea is the person who knows most about inrupt.net
But access to that server is limited to a handful of people, and none of them would be inclined to go reading your data for kicks.
That said though, PodSpaces (https://start.inrupt.com) is powered by Inrupt’s Enterprise Solid Server and has much better protections around data privacy (e.g., it’s backed by PostgreSQL and S3, both strongly secured, and deployed in Kubernetes, so there isn’t even really a “server” to gain access to (i.e., we can’t just SSH into a machine and read the data)).
But access to that server is limited to a handful of people, and none of them would be inclined to go reading your data for kicks
But what about if my data is too confidential? It is necessary to save the data in an encrypted way.
For encryption, what should I follow, any third party or anything else about the pod provider I am using in that case? If pod provider handles this data in an encrypted way, then it is good, nothing has to do with data what I want actually.
No, the server should know nothing about what the client searched for, and also what the client received as a response. Both are encrypted. Also the intermediary results (ie the 0/1 masks) are encrypted and the server does not know the values of them. But I don’t know how exactly this encryption works, for instance why the server can’t differentiate between all the encrypted 0 s and 1 (probably the encryption includes randomness, so encryption(0) != encryption(0) , but )
Is it true that data is saved in an encrypted format regardless of which provider I use?
No, it’s not. Are you encrypting the content before you send it to the remote (potentially untrusted) server? No? Then assume that you data might be able to be read by a third-party (though there can be legal ramifications for doing so, and different systems can be employed to ensure data isn’t easily accessible or at risk)
Homomorphic Encryption is fantastic, but people are only just still trying to figure it out, and how to make use of it. End to End encryption is likely what you want, but we’re still figuring out what this means, the implications it has, and how to do it in Solid.
If you’re wanting to store truly sensitive data, then anything you’re not paying for is likely to not be the place to store it, unless you explicitly trust the people operating the service.
Edit: another way to put it, your data is only as safe as the measures put in place to protect it, it’s just like how you should take photos of the keys to your new flat, or your boarding pass, or write passwords on sticky notes attached to your monitor: those things might feel “safe” given your level of security awareness, but in all actuality, they are really unsafe.
I should also note: inrupt.net is a test environment, it’s an early deployment of NSS, and isn’t covered by any warranty or guarantees. As is written on the homepage for it: