What is the best way to gather data from multiple pods so that differential privacy can be applied?

Hi,

CSS provides an API to manage accounts, which is documented here: JSON API - Community Solid Server
Here is a sample usage of it I quickly wrote down some weeks ago (replace baseUrl with your servers url): API for creating solid pods - #15 by A_A

I’m not really familiar with differential privacy and how exactly you want to achieve it. So I may misinterpret the details behind your question. However, I’d see three possible implementations:

(1) You tell the users to give the researchers access to the data in question. Then the researchers download the raw data and locally (on the researchers server) process it in a way that preservers differential privacy
(2) You have 3 entities: Users, a differential privacy proxy, the researchers. The users give the differential privacy proxy access to the data. The proxy fetches the data and processes it in a way to preserve differential privacy. The proxy stores this output in their pod and gives researchers access to the data so they can use it for their statistics. It’s more or less the same as (1), however the proxy could be handled by a trusted third party instead of individual researchers.
(3) You create a custom CSS implementation that has a custom API for differential privacy. You tell users to use this custom CSS pod (or migrate their data there). Then they give researchers access to the data and the researchers use the custom (non-standard) API to access the data.

The first two have the advantage, that any user with a Solid pod could contribute data. The third one would diverge from the Solid standards, so only users that use your custom CSS implementation could contribute data.

Regarding tools to achieve this, I’d suggest to look at Tools and libraries overview · Solid. In particular, Inpruts solid-client is pretty good imo.

2 Likes