How data is partitioned?

Hi,

I’m looking into how to organize my data to be used by an application.

https://github.com/solid/node-solid-server uses the file-system, making it simple to use because we all understand folder hierarchies; also we can define permissions, etc to them.

My question is:
Is the folder structure the primary way to partition data in a pod?

I guess that people will organize data differently as it happens in the Desktop. How applications maintain consistency?

What do you mean with that? Most people still use a folder structure on their desktop, no?

I mean that different people organize things at distinct locations with distinct names

I was referring to partitions as the datasets to which one defines permissions, policies etc. Perhaps a better name is container.

Related to how people may want to organise their data is how they explore and retrieve it. I put forward some thoughts here which might be of interest:

1 Like

Hi, I agree with those thoughts, in the real world people organize things in a myriad of ways, far beyond the file structure. I’m wondering what the basic unit that groups resources logically is.

How we define things like ‘music that Alice and Bob like’ to put those in an application (and add permissions etc.)

… what are those? Folders? Collections? Containers? Views? resource-spaces?

A user/developer can define them as files, path patterns or queries in a language the user/developer knows.

just wondering :slight_smile:

2 Likes

Containers , Containers, Containers. It’s the Containers. The spec is very clear that the logical relationship of containment is an organizational principle of Solid unrelated to how it is implemented (file system, database, whatever).

1 Like

In Solid, when you place X inside Y, you are not putting a file in a folder or a row in a database, you are adding a triple <Y> ldp:contains <X>.

2 Likes

Direct and Indirect containment are not supported. There aren’t any immediate plans to do so at this time. Per the Solid Protocol - Storage:

Solid has the notion of containers to represent a collection of linked resources to help with resource discovery and lifecycle management.

There is a 1-1 correspondence between containment triples and relative reference within the path name hierarchy. [Source]. It follows that all resources are discoverable from a container and that it is not possible to create orphan resources. [Source]

The representation and behaviour of containers in Solid corresponds to LDP Basic Container and MUST be supported.

Probably worth adding that containers provide limited semantic information about their content, so e.g. contacts should be stored in an address book, not just in a container. While a human can guess that the contacts form an address book, a computer cannot/should not.

For similar reasons, data discovery should be by registration not by selecting a container in which the human knows the data is stored.
The role of containers is really just for logical separation, not to express semantic relationships.

2 Likes

Absolutely right. It’s worth noting that aside from server-managed containment triples, a container is just another RDF resource, which can house that semantic information.

Regarding a pattern for how data is partitioned / stored - that’s exactly what we’re working to standardize in the Interoperability Panel - specifically under the Solid Application Interoperability, and Shape Trees work tracks.

2 Likes

Excellent points @josephguillaume. I’d like to elaborate on your statement “containers provide limited semantic information about their content” because that fact is a key sticking point in how new users transition into Solid.

On my home computer, I have a folder labeled “Music” and, in that, a hierarchy of “World Music”, “African Music”, “Afrobeat”, “Fela Kuti” and, within that, files that end in extensions .ogg, .mp3, etc. While not everyone is as obsessive as I am, something like this is the way most people relate to their personal computers.

As, @justin points out above, "There is a 1-1 correspondence between containment triples and relative reference within the path name hierarchy. "

So if containers correspond to path hierarchies, why can’t I just slap my folder structure from my home computer onto my pod and call it quits? Well, I can. Pods will let me treat them that way. I’m done, right? Terms like “Music” and “African Music” appear semantic to me. So I’m living the Solid life, cool!

But the labels I put on the containers have no meaning to software. Software can’t guess from the label that “Afrobeat” is a kind of “African Music”. It can’t even tell from the .ogg and .mp3 extensions that the files contain music - they might be voice recordings. And what happens if I want a list of other saxophone players or musicians from Nigeria?

That’s the real Solid life - being able to tell the software that Afrobeat is a kind of African music and that Fela Kuti is an Afrobeat musician, a person, a Nigerian, a strong advocate of African autonomy, and a saxophone player and that those particular .ogg and .mp3 files contain his music.

3 Likes