How exactly does Solid decouple data and applications?


#1

I am trying to understand how exactly solid decouples data and applications.
I understand that whatever data I create is stored in my solid POD instead of an application’s servers.
Now, if I want to use an application: I think the application will probably ask me permission to read/write to my data, and I can choose which data I let it see.
But now, once I have given it permission to access my data, what is preventing it from storing my data in its own server somewhere? Or does any app built on Solid simply has no way to store that data elsewhere - i am confused how this works.

Please help me understand this better.
Thank you!


Some basic questions about solid
Best practices for ownership in social?
#2

As I understand it, Solid can’t stop an application from copying the data. If an application can use it, then digital data can be copied (at least until practical, general purpose homomorphic encryption or something else that allows data to be interrogated without being revealed arrives). However, it would be much more inconvenient for an third-party to build huge central databases where the data is pulled from millions of different sources - any of which could be turned off at any minute - and to keep it all up to date than it is currently. Maybe later they’ll be ways of tagging data to make it trackable, or to only decrypt it in line with automated preconditions, but AFAIK this is not possible yet

Solid also allows competitive applications to use the data instead of it being locked away inside a walled garden (so SolidTwitter could use your SolidFacebook data if you allowed it to), and if your ID is under your control, you need only point a service towards it rather than having to retype your details over and over again into each company’s web forms as we do now.


#3

Okay, I see. Thanks for the response.
Another question that comes to mind though, is what about applications like Machine Learning and AI.
Let’s say we shift to a world where all data is stored within every individual’s POD.
How will companies be able to apply AI, which needs a lot of data to run?

Perhaps, we could have a system where the company has to pay users a certain amount in order to store their data and train AI using it. It might help in creating a more balanced society than what it is going to be without decentralization of this data?
What are your thoughts on this?


#4

Yes that’s an interesting one. ML needs a lot of data for learning. I don’t forsee a future where all data is stored in pods, that would not make a lot of sense as 99.999…% of data is just ‘white noise’ machine to machine streams or public domain. So it will just be personally identifiable information and consciously self-created data and possibly some other derived data too. Once that’s under our controls we can say how we want it to be used and under what conditions, possibly as you suggest creating a market for personal data where we can earn something by sharing it. Early days but without first establishing the means of control nothing like that can happen.


#5

You could limit which connections an app could make if you run the app on your own server (e.g. whitelisting of allowed URLs). This is a bit more difficult on hosted solutions, but is hopefully something we can deliver tools for in the future :slight_smile:


#6

I disagree that the 3rd party application will find it difficult to make a replica of the pod data. As a developer I guess it will be quite easy to save data as replicas and keep them updated from my time to time.

IMO what solid is trying to do is kind of criminalise that process. So, currently the companies can legally store your data, but if solid Terms of Service specifically mentions that the application cannot store the copy of user data then the chances are very less. But still some companies might still keep a copy but there is no way to know, its kind of similar to thinking that facebook logs my id and password to text file every time I log in, you never know!


#7

Maybe so, but I guess it’s much less convenient than having everything in a central database, and certainly harder to wall off. I think redesigning the app to be a view into data is also part of the solid vision see @RubenVerborgh’s blog Paradigm shifts for the decentralized Web, but quite how it plays with today’s data centric apps I’m unsure. I’d like to find out.


#8

I agree, I think Solid reduces the power of big companies and the Internet monopolies, but it doesn’t secure users’ privacy. It reduces exposure to surveillance significantly, but unfortunately a little leakage here and there is what does most of the damage.

For me the big wins of Solid are adding value to your data, and all data, and decoupling applications from owning your data, so you can easily move from one application to another.

Those are very powerful features, and why I’m such a fan. But I think we can go further, and improve privacy and security much more, if we can remove the vulnerabilities that servers create, and the inevitable centralisation that will come with ‘Solid as a service’.


#9

Isn’t there a legal aspect here to consider based on the current individual/company relationship, typically presented in the form of “terms of agreement” that have to be accepted for an individual to use a company’s site, consume products and services from them, etc?

Solid appears to offer an opportunity to turn that around! I think that if you want to engage me and obtain my business, the entity would have to agree on MY terms of MY data and its use. While THEY may choose to save my data for the purposes of facilitating OUR “contract”, they may not be permitted to use it for any other purpose - of course all those details would have to be clearly stipulated.

I would expect the open source community and the legal advocates to recognize this opportunity and provide the resources needed to devise such terms in a similar fashion to how various “non-proprietary” software licensing came to be.

I think that’s where the value of the decentralized approach to data really is - with the ability for me to take back what’s been given under the perception of “making it free” - data is obviously very valuable, and I look forward to having more control over this valuable resource that is fundamentally MINE.

thanks!


#10

Great question and discussion. My points are:

  1. Encourage the service providers to make the replicas of the pod data with authorization rather than preventing or criminalize so. It’s all about the service quality and thus the user experience. Few people can stand a forum app which makes HTTPS calls to the whole bunch of POD servers (some of them even timeout) with a single refresh action. If the UX is not competitive with existing centralized ones, it’s very hard to engage people in.

  2. We cannot count on legal aspect for privacy protection. Doing so would put solid-based apps and pod server providers to the very similar position as existing web service companies, and would easily get breached when the interest is grown big enough. Few things is reliable or sustainable except technology itself.

  3. Openness is the strength. I totally agree with @happybeing that privacy protection is not what we can easily practice so far (I may miss sth. since I’m new) The potential change should be on reducing the power of big companies and the Internet monopolies. But again, to practice it, we need to think more about what better UX we can provide to users comparing with existing services. Users would not jump in and learn the whole bunch of new concepts without enough convenient .


#11

@dprat0821 (and the others here), thanks for the discussion!

As a dev seriously considering using Solid for my next app, this thread has really helped assuage my concerns around the efficiency/practicality of the platform. If you are creating an app that is serving media, then having the data only live in the user’s pod could seriously impact performance and lead to some availability issues too…

It seems to me that the best way to handle data will frequently be to store the data on the app side but treat it more like a cache that is refreshed while the app is connected to the Solid provider. That way the app can efficiently serve the user data while constantly making sure that they are authorized to use the data by the server.

I love the Solid philosophy and I’m glad that it doesn’t have to conflict with needs for performance, user experience, or advances in data science and technology.

Of course, I’m new here so let me know if I’ve misunderstood something.


#12

I agree with the summary from @dprat0821 above, but would like to add my 2c.

Openness great. Privacy needs new tools. Performance is important but adequate for now.

Replication mentioned, I would classify as performance improvement similar to caching, very similar to web caches we’ve had for a long time. Their legality has been tested many times over, but seems to have subsided. Caches are well defined and cache coherency protocols don’t need to reinvented.

I also disagree decentralization helps with privacy directly or that it is very difficult to harvest lots of data. Yes, it is more difficult than querying a centralised DB, but only slightly. Just look at google indexing the web for an example.

The great benefit of Solid is the openness of the data. The data is mine, I can copy it, I can delete it, I can edit it, I can also use for new purposes. The data also belongs to the app that created it. More work is required to reconcile the shared ownership, though.

Improved Privacy will need auditing applications (not there yet) and permissioning templates, but they will be possible if the data is visible, not walled only for the app benefit. At the moment standard ACLs are too granular for normal users, even power users will struggle maintaining a lot of differently shared data. But it can be worked on right now. Takers ? :slight_smile:

Re Performance : it is a valid concern, albeit secondary so far. I don’t see a conflict and the caching mentioned is implemented already in one of the access libraries as far I understand.


#13

Fundamental to Solid, and inherited by any Solid App, is the loose-coupling of the following items:

  1. Identity – via WebID
  2. Identification – via WebID-Profile Doc (your Identity Card)
  3. Authentication – via OpenID Connect (OIDC) or TLS authentication protocols
  4. Authorization – via WebACLs that offer fine-grained Attribute-based Access Controls (ABAC)
  5. Storage – via HTTP PATCH to an authorized document location (or URI/URL).

Examples in the wild?

  • Dokie.li – user WebID-Profile document informs app about storage preferences
  • MarkBook – Bookmarking App that’s also informed by the WebID-Profile doc of its user
  • OpenLink Structured Data Sniffer – Browser Extension (available from Chrome Store) that includes an “Upload Raw Data” action for writing RDF-based metadata gleaned from HTML documents to a Solid Pod (this also informed by the WebID-Profile doc of its user)