General questions and clarifications on app development


#1

Hi everyone,

I’m a backend developer who has spent the past 4 to 5 years developing backend platforms for social apps (the latest being keakr.com) so I believe I’ve gathered quite a lot of experience in building real-world social systems. I’m working on a new project now - yet another social app - and we’re considering to build it on top of Solid. I think I’ve grasped the basic principles of how Solid works, but then I’m having a hard time trying to map them to actual use-cases.

I guess the best way to proceed is to describe some typical user flows and try to figure out how they would be implemented with Solid. Please feel free to comment, invalidate - or even confirm - what I’ll describe below. I’ll highlight open questions with [?]. It may become a long, but hopefully interesting thread that can benefit any developer considering Solid!

Registration
This is usually done by creating an account with email/password or using another social platform as identity provider. In the case of Solid:

  • we eventually expect a WebID, so either the user already has one and we let her log in, or she creates one on our servers
  • the user also needs a POD, so same story: either she gives us access to an existing POD or creates one with us

Creating posts
Standard flow, but instead of storing the posts and its assets in our DB, we store them in the user’s POD.
[?] Should we first create a directory where we would store all the data coming from our app?

Listing posts
That’s where it becomes less obvious to me. If a user wants to list all the posts she has created, we just have to list them from her POD, but in many applications we need to issue complex queries to filter and sort data (we may want to do full-text searches on the posts’ content or sort the posts by a number of different criteria). So [?] how we do achieve that? I understand that we may use SPARQL as a query language, but what if the user’s POD does not support some SPARQL constructs that we need?

Sharing posts
In our model, posts are not automatically shared with friends/followers but are explicitly shared by the author with other users. I guess those shares would take the form of triples written in the target PODs and pointing to the content in the author’s POD.

Listing shared posts (or displaying a “feed”)
If we want to display all the posts that have been shared with a particular user, we would start by listing the “share” triples I mentioned just before, but then [?] would we need to peek at all the destination PODs to build the results (with the same considerations about filtering and sorting)?
I can’t see any descent way to make that work, considering how we need to fine-tune our queries and indexes when all the data resides on just one database. If some users’ PODs are down or even slow to respond, it would considerably affect the overall performance of the app.

I’ve got many other questions but I think that my initial assumptions may be totally wrong so let’s just start with that. I thank you all in advance for your answers and look forward to the discussion!

Thomas


#2

A few thoughts about listing and searching: your app could, server side, index the data your users are given access to, by sending it via ajax from the browser to your server - in effect “harvesting” available data for your users to search through. But I pretty much think that is going against the spirit of Solid …

The issue is real though: as a user, I grant a browser app access to my data and indirectly I thereby also grant access to my friends data (the data they have chosen to share with me). But since sharing says nothing about the app that uses the data (*) we can easily end up having browser apps that harvest my friends data and uses ajax to send it to their own server without any consent from my friends - my friends think they shared data with me and only me - but in reality they share the data with me AND any application I choose to use as an agent to work with my POD.

There are lots of good ideas in Solid, but w.r.t. privacy there is also a couple of broken places (as far as I understand it - and it would be great to be proven wrong here!). See also my other thread: Inter-app access control

(*) Sharing is only about granting other users access to your data without considering what browser app they choose to use.

/Jørn


#3

Thanks Jorn for your answer. Indeed the app could “augment” the data from the POD by indexing it in its own DB, but as you said it’s already moving away from the spirit of Solid, not mentioning the consistency issues this may cause (having to update the index when the data is updated or deleted on the POD, possibly through the pub/sub mechanism described in the spec?).

I would love to hear the opinion from the people behind the Solid initiative. As far as I understand, Solid has been built as a foundation for social apps, and the use-cases I’ve described are very common in such apps, so maybe I’m just missing some point.


#4

Likewise. This seems to be a pretty fundamental question.


#5

Do we know if the creators of the Solid spec are reading this forum? Or is there any other way to engage conversations with them?


#6

Yes they do read this forum, at least some of them do. The other place is https://gitter.im/solid/chat


#7

Quick summary for those interesting in the topic:

First, I’ve realized that the Solid concept does not involve any backend development, which means that client apps directly communicate with PODs (and only with PODs). It doesn’t answer my questions, it only clarifies which parts we can work with (i.e. no custom server logic).

Then, I had a very interesting chat with @kjetil on the Gitter chat, here are the main takeaways:

  • The approach recommended by the team to handle queries involving data from other user’s PODs is to maintain a local, client-side in-memory cache
  • The cache should be properly populated with data that enable local queries to answer quickly with initial results, than resorting to remote PODs if the user pages through results
  • This approach seems feasible in scenarios where data is queried chronologically, as we can rely on notifications / activity streams to have fresh data in the cache
  • It’s obviously much more “challenging” (i.e. less feasible or not feasible at all) in cases where we can’t predict what should be in the cache! An example of that being a spatial search requiring to have in the cache data that’s around the user’s current position

#8

As a quick follow-up, I just wanted to let you know that we’ve decided not to proceed with Solid for now; we just feel that its architecture is currently not compatible with our requirements. We’ve carefully evaluated the technical approach described above (using a client-side cache for data from other user’s PODs) and came to the conclusion that this approach will inevitably yield slow and/or incorrect/incomplete results. Considering the very short attention span we get from users these days, an app that is slow or doesn’t work well may be used once but not twice!

Now in order to be constructive, I would like to expose my personal thoughts. My (humble) point of view is that there’s a missing layer; a lot of focus has been given to the user’s side of things, but similar work should be done on the app’s side. I think that in order to fulfill the requirements of many serious real-world apps, we will need a way to address an endpoint that can serve the entirety of an app’s data, and I see 2 ways to achieve that (that’s highly speculative and just food for thought!):

  • Just like we have user PODs, we could have app PODs that gather all the data related to a particular app; user PODs would automatically sync their data with the app POD by pushing updates (so this would be part of the Solid protocol between user and app PODs, and transparent to app developers). From the app’s perspective, app PODs would be read-only as we would still be writing to the user PODs, but this additional read-only layer could give us the ability to perform the type of queries I’ve mentioned before. I guess that this would still respect the spirit of Solid as updates or deletions of user data would be reflected in the app PODs, which btw can be considered as some kind of app-level cache. The main issue with this approach is related to scaling: those app PODs may end up storing a lot of data, which would require horizontal scaling, sharding of data across nodes etc. The second issue I see is that the default querying capabilities of the app POD may not be enough for some apps (think full-text search, spatial queries, application of ML models etc.).
  • Another approach would be to have a similar kind of replication happening through custom webhooks, so app developers would have to build backends exposing webhooks that user PODs would call whenever (app-related) data is updated. This has the great advantage of letting us app developers take care of storing, searching and scaling (which some of us have become pretty good at!) but it obviously branches away from the spirit of Solid, as data is explicitely replicated outside of the Solid eco-system.

No silver bullet I’m afraid, but I hope that it can help get the conversation going. And even though we won’t be building on top of Solid, I’m very happy to continue this conversation with you!