How exactly does Solid decouple data and applications?

I am trying to understand how exactly solid decouples data and applications.
I understand that whatever data I create is stored in my solid POD instead of an application’s servers.
Now, if I want to use an application: I think the application will probably ask me permission to read/write to my data, and I can choose which data I let it see.
But now, once I have given it permission to access my data, what is preventing it from storing my data in its own server somewhere? Or does any app built on Solid simply has no way to store that data elsewhere - i am confused how this works.

Please help me understand this better.
Thank you!

4 Likes

As I understand it, Solid can’t stop an application from copying the data. If an application can use it, then digital data can be copied (at least until practical, general purpose homomorphic encryption or something else that allows data to be interrogated without being revealed arrives). However, it would be much more inconvenient for an third-party to build huge central databases where the data is pulled from millions of different sources - any of which could be turned off at any minute - and to keep it all up to date than it is currently. Maybe later they’ll be ways of tagging data to make it trackable, or to only decrypt it in line with automated preconditions, but AFAIK this is not possible yet

Solid also allows competitive applications to use the data instead of it being locked away inside a walled garden (so SolidTwitter could use your SolidFacebook data if you allowed it to), and if your ID is under your control, you need only point a service towards it rather than having to retype your details over and over again into each company’s web forms as we do now.

4 Likes

Okay, I see. Thanks for the response.
Another question that comes to mind though, is what about applications like Machine Learning and AI.
Let’s say we shift to a world where all data is stored within every individual’s POD.
How will companies be able to apply AI, which needs a lot of data to run?

Perhaps, we could have a system where the company has to pay users a certain amount in order to store their data and train AI using it. It might help in creating a more balanced society than what it is going to be without decentralization of this data?
What are your thoughts on this?

1 Like

Yes that’s an interesting one. ML needs a lot of data for learning. I don’t forsee a future where all data is stored in pods, that would not make a lot of sense as 99.999…% of data is just ‘white noise’ machine to machine streams or public domain. So it will just be personally identifiable information and consciously self-created data and possibly some other derived data too. Once that’s under our controls we can say how we want it to be used and under what conditions, possibly as you suggest creating a market for personal data where we can earn something by sharing it. Early days but without first establishing the means of control nothing like that can happen.

3 Likes

You could limit which connections an app could make if you run the app on your own server (e.g. whitelisting of allowed URLs). This is a bit more difficult on hosted solutions, but is hopefully something we can deliver tools for in the future :slight_smile:

2 Likes

I disagree that the 3rd party application will find it difficult to make a replica of the pod data. As a developer I guess it will be quite easy to save data as replicas and keep them updated from my time to time.

IMO what solid is trying to do is kind of criminalise that process. So, currently the companies can legally store your data, but if solid Terms of Service specifically mentions that the application cannot store the copy of user data then the chances are very less. But still some companies might still keep a copy but there is no way to know, its kind of similar to thinking that facebook logs my id and password to text file every time I log in, you never know!

3 Likes

Maybe so, but I guess it’s much less convenient than having everything in a central database, and certainly harder to wall off. I think redesigning the app to be a view into data is also part of the solid vision see @RubenVerborgh’s blog Paradigm shifts for the decentralized Web, but quite how it plays with today’s data centric apps I’m unsure. I’d like to find out.

1 Like

I agree, I think Solid reduces the power of big companies and the Internet monopolies, but it doesn’t secure users’ privacy. It reduces exposure to surveillance significantly, but unfortunately a little leakage here and there is what does most of the damage.

For me the big wins of Solid are adding value to your data, and all data, and decoupling applications from owning your data, so you can easily move from one application to another.

Those are very powerful features, and why I’m such a fan. But I think we can go further, and improve privacy and security much more, if we can remove the vulnerabilities that servers create, and the inevitable centralisation that will come with ‘Solid as a service’.

5 Likes

Isn’t there a legal aspect here to consider based on the current individual/company relationship, typically presented in the form of “terms of agreement” that have to be accepted for an individual to use a company’s site, consume products and services from them, etc?

Solid appears to offer an opportunity to turn that around! I think that if you want to engage me and obtain my business, the entity would have to agree on MY terms of MY data and its use. While THEY may choose to save my data for the purposes of facilitating OUR “contract”, they may not be permitted to use it for any other purpose - of course all those details would have to be clearly stipulated.

I would expect the open source community and the legal advocates to recognize this opportunity and provide the resources needed to devise such terms in a similar fashion to how various “non-proprietary” software licensing came to be.

I think that’s where the value of the decentralized approach to data really is - with the ability for me to take back what’s been given under the perception of “making it free” - data is obviously very valuable, and I look forward to having more control over this valuable resource that is fundamentally MINE.

thanks!

3 Likes

Great question and discussion. My points are:

  1. Encourage the service providers to make the replicas of the pod data with authorization rather than preventing or criminalize so. It’s all about the service quality and thus the user experience. Few people can stand a forum app which makes HTTPS calls to the whole bunch of POD servers (some of them even timeout) with a single refresh action. If the UX is not competitive with existing centralized ones, it’s very hard to engage people in.

  2. We cannot count on legal aspect for privacy protection. Doing so would put solid-based apps and pod server providers to the very similar position as existing web service companies, and would easily get breached when the interest is grown big enough. Few things is reliable or sustainable except technology itself.

  3. Openness is the strength. I totally agree with @happybeing that privacy protection is not what we can easily practice so far (I may miss sth. since I’m new) The potential change should be on reducing the power of big companies and the Internet monopolies. But again, to practice it, we need to think more about what better UX we can provide to users comparing with existing services. Users would not jump in and learn the whole bunch of new concepts without enough convenient .

5 Likes

@dprat0821 (and the others here), thanks for the discussion!

As a dev seriously considering using Solid for my next app, this thread has really helped assuage my concerns around the efficiency/practicality of the platform. If you are creating an app that is serving media, then having the data only live in the user’s pod could seriously impact performance and lead to some availability issues too…

It seems to me that the best way to handle data will frequently be to store the data on the app side but treat it more like a cache that is refreshed while the app is connected to the Solid provider. That way the app can efficiently serve the user data while constantly making sure that they are authorized to use the data by the server.

I love the Solid philosophy and I’m glad that it doesn’t have to conflict with needs for performance, user experience, or advances in data science and technology.

Of course, I’m new here so let me know if I’ve misunderstood something.

4 Likes

I agree with the summary from @dprat0821 above, but would like to add my 2c.

Openness great. Privacy needs new tools. Performance is important but adequate for now.

Replication mentioned, I would classify as performance improvement similar to caching, very similar to web caches we’ve had for a long time. Their legality has been tested many times over, but seems to have subsided. Caches are well defined and cache coherency protocols don’t need to reinvented.

I also disagree decentralization helps with privacy directly or that it is very difficult to harvest lots of data. Yes, it is more difficult than querying a centralised DB, but only slightly. Just look at google indexing the web for an example.

The great benefit of Solid is the openness of the data. The data is mine, I can copy it, I can delete it, I can edit it, I can also use for new purposes. The data also belongs to the app that created it. More work is required to reconcile the shared ownership, though.

Improved Privacy will need auditing applications (not there yet) and permissioning templates, but they will be possible if the data is visible, not walled only for the app benefit. At the moment standard ACLs are too granular for normal users, even power users will struggle maintaining a lot of differently shared data. But it can be worked on right now. Takers ? :slight_smile:

Re Performance : it is a valid concern, albeit secondary so far. I don’t see a conflict and the caching mentioned is implemented already in one of the access libraries as far I understand.

1 Like

Fundamental to Solid, and inherited by any Solid App, is the loose-coupling of the following items:

  1. Identity – via WebID
  2. Identification – via WebID-Profile Doc (your Identity Card)
  3. Authentication – via OpenID Connect (OIDC) or TLS authentication protocols
  4. Authorization – via WebACLs that offer fine-grained Attribute-based Access Controls (ABAC)
  5. Storage – via HTTP PATCH to an authorized document location (or URI/URL).

Examples in the wild?

  • Dokie.li – user WebID-Profile document informs app about storage preferences
  • MarkBook – Bookmarking App that’s also informed by the WebID-Profile doc of its user
  • OpenLink Structured Data Sniffer – Browser Extension (available from Chrome Store) that includes an “Upload Raw Data” action for writing RDF-based metadata gleaned from HTML documents to a Solid Pod (this also informed by the WebID-Profile doc of its user)
4 Likes

Can I know how the caching was implemented? Does it need any authorization from user?

There are a few comments on here that imply businesses will still have access to your information in a Solid POD, but that is only true of information that you make public.

Currently, whether you mark something as ‘public’ or ‘friends only’ on Facebook, Facebook itself hold all your information on their server and, with or without your permission and with or without your knowledge, may pass all that information on to other companies.

With a Solid POD, you can make something ‘public’, ‘private’ and ‘friends only’. That information may all still be stored in a single POD, but every individual POD may be on a different server rather than on Facebooks servers.

Companies will still want access to your content, but they will only have access to information that YOU make public. There will be no central server, and no central business, from which they can buy your information. You can of course add any company as a ‘friend’ and give that company access to information that is ‘friends only’ information, but that would be YOUR choice, not an arbitary decision of some faceless company.

2 Likes

Good question about caching authorization. I spoke of two different cases of caching.

  1. Caching in the library already implemented: the cache is inside the library, which is inside an app which accesses the data. So, the cache has the same rights as the overall app — it is the same as the app, so there is no conflict.

  2. Caching as replicating in different places for performance and resilience: @dprat0821 already said above in point “with authorization” and explains why you would want that - mainly for resilience, but helps performance too. Public data caching is the same as web caches today. The interesting bit is caching semi-private data with authorization — you need to trust the caching provider with the particular data only as you would the original POD host.

Would it not be easier for an app - like a social web page - to just store links to your data (profile, comments…) than the data itself? And is this not the whole point of Solid? I’m a newbie to Solid myself, and trying to figure out how a social platform would work with pods. My understanding so far is that instead of storing data, you’d just store references. Thus, when looking at a profile, or a conversation, the app would have to pull all the data from pods using the links provided by each user, basically re-generating content on the fly. And this would (at least considering storage space) be easier for the app or platform in question, albeit maybe providing a somewhat slower experience for the users. Right?

Currently, as far as I have understood, there is no technical solution around, that prevents app-providers from caching, storing or even selling once POD-released user-data the same way they currently do. Right?

So, currently Solid-Apps will just relieve end-users from having to type in the same profile-data over and over again and will provide a mechanism to some kind of white-spiritual-holy-boly-apps that will promise, that they will never ever store, transfer or even sell (how evil) POD data to third parties (keep fingers crossed).

So, what can we really do to prevent our data going clear-texted into some not so clerical datasilos?

Legal enforcement

To really prevent being locked into the mercy of the all-dominating “Fearsome Five”, a new generation of Solid-apps must be developed, that all must comply with a legally prosecutable agreement to copy and store POD-data on their side only the way it is intended by their POD owners.

Expiration Time

Legal could even enforce that app-side cached or stored data must regularly be counterchecked (similar to Web-Cookies) with POD data for accuracy and accessible that might have changed in the mean-time. So, those apps will have to operate with POD data that will always come with a POD-user defined lease date and legal authorities will have to sack transactions with expired leases.

Encryption

Another category of apps might comply to a standard, where data is never given away from the POD in clear-text, but in the forms of encrypted tokens that can only be displayed in some kind of standardized mini-viewer-components, that will display the data in a non printable, not copy/pastable way and never as a whole (Imagine an operator that has to actively click every single piece of information that will get the data from the POD and will display it only as long as the mouse is not moved nor clicked. You move the mouse - data is physically gone! We could even go so far, that such components will display realtime-decrypted data just through a tiny pinhole below the mouse - no way to copy all the context.

Microcodes

A different approach would be to underly displayed clear-text with micro-codes that could be - traced in screen-shots, will confuse OCR-readers or can be tracked another way similar to nano-particles in the finish for cars.

1 Like

Aren’t app providers just doing that, providing javascript apps which are run in client (= end user’s browser)? So only the client can access data from POD, data never goes to app provider’s server?

Of course the javascript app can send the data to app provider but that is possible to detect.