Encrypted POD? Is solid designed with this in mind? If not, would it be possible to add?


#1

@taggart wrote:

My organization has several applications that might be interesting candidates to move to solid. We currently store the application data on our own backends, but we could potentially become a POD provider and allow our users to use us or another provider instead. One of our requirements for doing so would be to have a way that data on the POD provider is encrypted in a way that the POD provider itself cannot read. Or maybe put in more generic terms, access control should be cryptographically ensured.

Some of the reasons for doing this include:

  • organized crime, nation-state, etc hacking of services to steal user data
  • data breach through misconfiguation, API bugs, etc
  • physical theft of equipment
  • see https://haveibeenpwned.com/
  • malicious POD providers monetizing user data, injecting ads/malware/etc

The best way to protect the users is for the POD provider to not have access to the data.
Is solid designed with this in mind? If not, would it be possible to add?

Thanks

There has been some discussion on https://github.com/solid/solid/issues/170 with @panleya and @tatwater and would like to invite more here.


Will tying web IDs to hosters create lock-in?
#2

I’m curious (and have a rather limited knowledge of JavaScript crypto) - how could that be done in JavaScript? Is there a secure way to create and store an encryption certificate/secret in the browser, such that the browser can encrypt data before sending it to the POD (and vice versa)?

How would such a secret be moved to a different device such that I can access my POD from different devices?


#3

OK, so I interpret this question as a desire to not need to trust your POD provider.

Now, don’t take my answer as authoritative, and not as an official Inrupt answer. It is more like a braindump, because I think this question is really important and deserves an answer, and it has been open for long enough without one. Anyway, before I try to answer, let me just note a few things:

  1. You don’t need to trust any random POD provider, you can be your own, like I am, by installing the Solid server on hardware under your control. I think this should satisfy many of those who do not want to trust a POD provider, but I realize, not necessarily the poster.
  2. Stopping short of full end-to-end cryptography, the connection is always TLS in Solid, so that part is encrypted. Moreover, it is trivial to encrypt the file system the data resides on. However, this kinda misses the point, since the data will still be in clear text at some point in the Solid backend, and can be intercepted by an intruder or the POD provider itself.
  3. App developers can always encrypt on the client side, and make sure the literals in the RDF are encrypted with the user’s key.

To really address the concerns of the poster, we would need to have the data encrypted on the server side all the way down to disk with a key that the user controls. If the user is the only one who would have access to the data, that’s easy enough, but that wouldn’t be very social. I’m not a crypto guy, but it appears that TLS doesn’t make this easy, TLS connection terminates before we get to the disk. Again, not a crypto guy, it may well be solutions to this, and it would be interesting to hear.

However, there are another two things that makes it hard:

  1. People want to be social, they want to share data.
  2. More advanced apps are likely to soon require a more advanced query system, we might want to use e.g. SPARQL.

Without being a crypto guy, I could imagine that we build a protocol on Solid that uses the Web Access Control for key management to enable sharing keys with the people that you share data with. It doesn’t seem to me that this would require very substantial changes to Solid as it is today. I would require additional protocols, but I suspect it could be done with additions to Solid rather than a more extensive change, and so, to some extent the design of Solid should be accommodating.

Evaluating queries over encrypted data is very much an active area of research in academia. It has been going on for a number of years, and I have noted that quite a lot of this research revolves around RDF data on the Web, and thus, much of the research that is going into this is immediately applicable to Solid.

In conclusion, I think that Solid can enable a future where you don’t need to trust your POD provider, but right now, the shortest path to that is to install it on your own hardware. Beyond that, it will require quite a lot of work, but I certainly see the value of thinking in that direction.


#4

Our users are non-technical and aren’t the kind of people that could setup their own POD provider, they count on us to provide them communication services in the most privacy protecting ways possible. We might be able to educate them what a POD is, explain the concepts of providers and that they have a choice in who they choose as a provider. If we also knew that the POD provider couldn’t see the data, we could explain that rather than give stern warnings that “you better strongly trust your POD providers because they will have access to all your data in the clear”.

It sounds like it would always be possible to build the end-to-end encryption into our web apps on top of solid, but I was hoping there would be some way solid could provide this in a standard way (which would also help prevent app developers from getting it wrong, crypto is hard to do right).


#5

Thank you for your insight, @taggart! Indeed, I think there is a pretty strong case for end-to-end crypto-enabled PODs, as I also think there is a pretty strong for case for verifiable claims, zero-knowledge proofs and that kind of stuff in Solid. Maybe we should see this as a whole. I think it would take some time to get there though.


#6

Let’s say that I want to save my files to a server that is not fully trusted.

For me, integrity is the most important security aspect. If I share some content publicly, I would like some guarantee that the content is not tampered with. This would mean that my (web) application has to cryptographically sign the content, and the web framework need methods to verify the integrity.

Closely followed comes confidentiality. I have some content I would like to keep private. This means that my (web) application has to encrypt the content before sending it to the server.

For this to work, I would need a cryptographic key that I can use for signing and encryption. The key cannot be stored together on the same server as my content. So I would need a trusted party that can safely hold my key.

For instance, the identity, cryptographic keys and the content/files could be serviced by different organizations… Maybe I would pay my identity provider and key holder a bit more than I would for file storage…? :thinking:


#7

An opinion on encryption from another thread from @JornWildt Basic question about authentication


#8

In short, I think it should be possible to encrypt PODs and it should be on top of existing specs, not changing them.

Already a couple of people (@JornWildt, @hovenko) are suggesting storing keys separately from data itself. This is relevant (as said) where data is large and cheaper to store and keys are small but expensive to store in a trusted place. This is an attempt to gather and expand on previous comments.

  1. Signing is the easier part as no secrets must be distributed, but requires all apps to verify the signatures to all data. The public key is still stored in a separate trusted location (e.g., with the Web-ID). This ensures data is not tempered. Recommendations early on in the documentation to generate, store and verify signatures will help.

  2. Encryption for privacy requires keys to be distributed. Encryption keys will be stored separately. Sending these is not good as each accessor should be managed separately. Storing the keys on another (trusted) Solid server would make sense but is not required. ACLs are handled by the key storing server. Two ways for doing this so far:

2.1. Encrypt content only but leave the links/meta data in plain text. Hosts can still be Solid servers.

2.2. Encrypt entire PODS makes hosts web hosts only and presumably another Solid server will have the keys.

Concerns

  1. Server funcitonality – encryption of entire PODS or even text or other browsable content will prevent server functionality. As mentioned above encrypted PODS are hosted on dumb/web server and Solid functionality will be handled by another Solid server, possibly the key storing one, but probably yet another. Equally apps can do. Some would even argue for apps doing the processing in any case (“all is frontend with SOLID”).

  2. SPARQL is limited by encryption – yes, of course, but if the app doing the query has all the keys, it can still do the query. Admittedly, this is not straightforward now, but a layer should be added to the handle the decryption (and another one for verification).

  3. Queries over encrypted data is probably relevant.

  4. Leakage If an app provides summary data but preserves individually private data, there is a risk of leakage, but I think this is upto the app, not Solid to resolve.

In conclusion, I think it will be good (in order)

  1. Documentation Add recommendations in the Solid documentation for signing and public key storage and advertising.
  2. Code Add client library functionality for data verification.

Encryption looks much more difficult to agree on but should be easy to add without changing library specs. Signing should be possible later too, but looks much easier and gives credibility that security is considered early on.


#9

Hi,
Just one comment on this: “I think that Solid can enable a future where you don’t need to trust your POD provider, but right now, the shortest path to that is to install it on your own hardware. Beyond that, it will require quite a lot of work, but I certainly see the value of thinking in that direction”

I agree, but building on top of open source software already, might not be as difficult as one might think. Pushing security off into the future leads to problems later - look at DNS, SMTP, HTTP etc where the security had to be bolted on and it has taken decades to do so and it is not complete in many cases. e.g. spoofing senders, while humorous back in the 1980s, is still causing problems today with spam, phishing etc and while there have been proposals to deal with it, none have succeeded and now there are tons of bolt-on techniques to deal with suspicious emails, spam or whatever. Adding it in early, will save a world of hurt later.

Ditto for decentralization for DNS, and distributed PODs - having central points of failure or (perhaps worse) central points of pressure (e.g. from governments around the world) is already a problem. If someone in China or Russia wants their POD at MIT, they shouldn’t have to worry about a government hack from China, Russia, the US, UK, Iran, France or anywhere. Likewise, if someone from the US wants their POD in Australia, one shouldn’t have to trust the Australian government. Whatever the jurisdiction if the POD is solely located there, it is at risk from the government of that jurisdiction to be either hacked or taken down with no automatic network backup. When the contents are plaintext, it will make it easy for a government to say “delete this POD or we’ll take your entire server offline.” One doesn’t want to build a new system while leaving the issues of censorship, centralized control, and surveillance unaddressed.

I think it is clear that having a POD in plaintext on a server will mean they will be hacked and lost at some point either through API bugs or more likely server problems in the software (e.g. many including heart bleed) or hardware (e.g. Spectre, meltdown). The lesson since the 1970s is that security needs to be built-in and even then won’t be perfect, but at least there will be a default.

To the question above, there are a good number of crypto libraries in Javascript - e.g. bitcoinlib-js, twister-lib-js as two examples - which can be leveraged, but something to note is that if the app developers have to implement the crypto themselves vs having it “built-in” it will cause screw-ups. Likewise, if it isn’t included, a lot of developers won’t use it - they’ll do it “for version 2”.

There are decentralized domains and decentralized hosting so you don’t get tied to a particular provider or lose your POD if a provider goes down, is hacked, or is taken down. :slight_smile:


#10

I’m pretty new to the whole SOLID topic but I find the idea fascinating and very attractive.

However one of the first things that came into my mind was: “how will the POD - and hence my data - be protected server sided?”.

From what I understand, SOLID wants to bring back control over your personal data by storing them in a place where every person can control exactly who is allowed to access those data (the POD). I think it is save to say, that over the years we all learned that we can not “trust” any system or institution with taking care over our data (basically the points @taggart and @centaur lined out).

If we assume that the “average” user will try to store most - if not all - of his personal data in one POD the POD becomes a real nice and shiny gold nugget for data sellers, hackers etc. In some way it would be a lot more attractive and worth a lot more effort to get into possession of a well maintained POD with personal data.

Of course I could just host the POD myself (naively assuming my non professional home hosting environment ist perfectly save) but I don’t think this solution is practicable for the broad mass of users since:

  • they are not capable of setting up a server

  • don’t want to deal with the struggle of setting up a server.

  • their internet provider doesn’t allow them to host a “website like” service

So in order introduce POD Serving “as a service” people need to be able to trust that their PODs are save, no matter if the Hosting environment is “corrupted” by the hosting companies intention or a security breach.

I hope that POD encryption (or any other kind of securing the POD) makes it to the agenda.


#11

As a member of Netsso.com, I can load my files onto my Dropbox with (Netsso-provided) encryption, done properly on my local machine (whatever machine, no local software required). That is, when logged in to Netsso. Then I can download the files to any other machine, no local software required and they arrive decrypted, via my Netsso login. Or I can send them, encrypted, to any other Netsso member, i.e. sharing encrypted data, end-to-end. And all files can be managed individually via remote links, no need to log in to Dropbox, etc.
All my files may be stored in third party storages, managed via Netsso. Always encrypted, unreadable by storage administrators. Does this make Netsso a POD? Could Solid do the same?


#12

Yes, but people don’t always want to be social and they don’t always want to share data.


#13

Take a look at the Web Cryptography API


#14

Encryption will be required for any POD server receding within Europe since we here are talking about new technology that would pose a high risk to natural persons. reference: Examples of processing ‘likely to result in high risk

As defined in article 4 a ’ controller ’ means the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data;

Under EU privacy, Inrupt inc., the owner of the Solid pod server and the legal entity that creates the Solid application will be considered data controllers according to article 26. Which means a whole range of obligations comes into focus for all of the above. One example is that all the data controllers will need to do a DPIA before they can consider starting the development of an application that uses data from a solid pod. One of the mechanisms for securing data that should be considered is encryption.

In regards to encryption, not encrypting the pod means that Inrupt inc., the Pod server provider and the legal entity that owns the Solid application have to notify each affected data subject living within EU, immidiatly, in case of a security breach. I have written an article about GDPR and encryption that mentions this in particular.

Having the option to encrypt the data could be a very good idea both in transit and at rest, but don’t mix them. I see you have covered the transit part. Now all you have to do is to figure out how you will encrypt the data at rest.


#15

@sydseter “and the legal entity that creates the Solid application will be considered data controllers” If I was to create an app for my own use and I then decide I give that app / code away for free; I fail to see how I am in any way a “data controller” or am I missing something?

[Edit: for clarification, when I say app, I mean an app where ultimately the users data storage location is under the control of the user and not the app developer]


#16

That depends on whether you still are “the natural or legal person that alone or jointly with others, determines the purposes and means of the processing”.

Lets not turn this into a legal discussion. What actually is more important is that the Solid community is on a mission for improving privacy for all of us. In order to do so you need to properly make sure that the Solid ecosystem protects the confidentiality, Integrity and availability of the application user’s personal data. That can only be done by making sure the data is signed and encrypted both in transit and at rest.


#17

You’re really just jumping up and down on the shaky security foundation that is ironically called Solid. Most serious security related questions seem to be ignored or dismissed with, “That’s not a problem. This platform is about sharing!” or “Just register a domain name and host your pod at home on a Raspberry Pi!”.


#18

@zacharywhitley I suppose it depends on the type of app, but this is worth a read;
particularly the code injection stuff; https://www.nccgroup.trust/us/about-us/newsroom-and-events/blog/2011/august/javascript-cryptography-considered-harmful/


#19

To protect my POD (rpi3, at home), I will need to figure a low-cost solution similar to layer 7 gateway in front of it before leaving it exposed to the Internet 24/7 long term.

Encrypting its contents addresses a different need (or needs) which can be important depending on the context of the data exchanged. For me, POD encryption (entire or partial) is a later priority because my first use case is a chat app for family and friends use only. NSS not having encryption support out-of-box is not a show-stopper at this time for me.


#20

Security is never a show stopper at this time. It’s a show stopper when someone steals, deletes or holds your data ransom. Then it’s a show stopper and it’s not something you can bolt on after the fact.

It’s strange to have a project that says, “You should be in control of your data and be able to move it anywhere you want” but when you ask, “Can we make sure we don’t allow my data to people I don’t want to have it” it’s silence. I guess that part just isn’t as fun as pods and sharing and all that.

We love to hate the Facebooks and the Googles but in the end what we’ve got is a solution that would only allow me to choose between Facebook collecting my data, Google collecting my data or running some home made raspberry pi solution in my basement so in addition to my day job I have to be a part time sys admin. Better make sure I’ve got offsite backups, better buy two Pi’s and make that an HA setup. Better get myself an UPS and tighten down those firewall rules and make sure my packages are all up to date.