Solid STM about shape repo(s)

Hello everyone,

On last week’s CG meeting, we discussed about shape repositories.

For ActivityPods, we have a urgent need for this because we are finishing at the moment our full implementation of SAI, and so all our apps will need to point to shapes and shape trees, in order to be interoperable with each other. So we quickly created our own shape repository, based on Jesse’s previous work, except we use ShEx instead of SHACL-Compact, and we also host shape trees.

We also know of the work of @jaxoncreed for shaperepo.com

And the work of @michielbdejong for pdsinterop.org

@jeswr suggested there may be enough interest (and importance) in that subject to propose a Special Topic Meeting, so that we can discuss it together. So here’s a poll to find a date when, hopefully, all concerned actors can be here. Please cast your vote now :slight_smile:

Below I’m listing some of the requirements that I see for such a shape repository, but it’s of course open to discussion!

Requirements

  • ShEx + SHACL + SHACL-Compact negotiation for shapes
  • Turtle + JSON-LD + Quads negociation for shape trees
    • Predefined JSON-LD context to make JSON-LD more readable ?
  • Shape tree descriptions ?
  • Easy to deploy locally, so that developers don’t need to be online to build applications and so that they can test new shapes before submitting them.
  • Clear and easy process to add and deploy new shapes (possibly with GitHub PRs)
  • Frontend similar to https://shaperepo.com to explore/find existing data
  • Some tool to automatically validate shapes and shape trees that are added
2 Likes

Thank you for organizing the meeting, I’m looking forward to it.

I’ll be happy to see some common source of truth for shapes that Solid app developers could use.

Some months ago I wrote a proposal that’s somewhat related: GitHub - ldsham/proposal: Linked Data Shape Manager - The Proposal

The main idea was to have a bottom-up/crowdsourced way of publishing shapes (including shapes used by specific apps), and aggregating them into a larger shape of shapes. Easier way to preview relations that matter, and shape the missing ones with low entry barrier.

OTOH I still don’t get what’s good about shapetrees. If I understand them correctly, they feel a lot like an unnecessarily strict constraint on data organization within Pods. Specific folder structure for specific data — enforced hierarchy in the world of graphs. But perhaps that’s offtopic in this thread.

There has been a blog post published back in 2022 about hierarchy vs graph in Pods: Let’s talk about pods | Ruben Verborgh. It sparked wide enthousiasm (including mine), but also some pushback and unfortunately didn’t really make it into Solid mainstream.

The article must have been a product of deep understanding of the topic, and made a lot of intuitive sense to me. Perhaps LWS WG, with Ruben’s involvement, may revisit and bring back those ideas.

Very interesting! I’ll look into it :slight_smile:

That’s exactly what I thought when I first read about shape trees: “they’re trying to put a square peg in a round hole!”. But then @elf-pavlik showed me that shape trees can also be used with a flat hierarchy. This is very well explained in the first part of the Application Interoperability Walkthrough that I recommend to watch (it’s a bit old, but it lays the foundation for the SAI spec).

The advantage of using a shapetree instead of a single shape is that you can link resources / shapes together, and give – for example – the rights to view a project, but also all linked issues. It may be something close to your idea of “shape of shapes”, but I’ll need to read your proposal.

I shared this enthusiasm when this article came out! :slight_smile: But his proposal for “views” also seemed difficult to implement concretely, especially on filesystem-based Pod providers (we use a triplestore for ActivityPods so it’s possibly easier to achieve). IMHO SAI solves many problems that Ruben points out. It’s unfortunate that few Solid developers know about this spec! (it’s true I needed to read it several times to get into it)

1 Like

Btw see also GitHub - solid/shapes: Solid Shapes which is Solid CG’s initiative. Background: Code search results · GitHub
Thanks to Sarven for the link

3 Likes

Thanks for your votes @elf-pavlik @michielbdejong @mrkvon @ebremer
So our STM about shape repos will take place on Tuesday 25 February at 15:00 UTC
We can meet on that BigBlueButton link.
@jeswr I hope you’ll be available at that date!

2 Likes

Thanks for organising - see you then!

Minutes from the meeting - 25/02/2025

Presents: Jackson Morgan @jaxoncreed Jeff Zucker @jeffz Erich Bremer @ebremer Laurin Weger, Michal @mrkvon Noel De Martin @NoelDeMartin Jesse Wright @jeswr Michiel de Jong @michielbdejong elf Pavlik @elf-pavlik Sylvain Roquebert, Joshua Cornejo, Sébastien Rosset

What is your reason to be here?

  • eP: I want this to have solved for community. I have a background in the topic, with SAI.
  • Jackson: Made a shaperepo a few years ago. Also did LDO. Released a feature that take a shape and generate a library that can be released on NPM (…) I want to see using various native repositories like NPM. It doesn’t need to be centralized technologically, but we should have a central point where we can find the shapes and pointer to native platforms for people to use in their projects.
  • Jesse Wright: Did a shape repo also a few years ago. Lot of work on Shex and SHACL. We need a way to align on shapes in Solid. We must discuss not only on infrastructure, but also processes.
  • Jeff Zucker: Worked on shapes before and concerned with categorizing and finding the shapes. How does the community learn which shapes to use for a given domain - which features do those have. Would be great to create catalog of shapes / for describing shapes.
  • Michiel: Worked on PDS Overview | Conventions and how data is stored on pods, otherwise solid is useless. Don’t think we need to have one single repository.
  • Noel: Did recently a talk at Solid World Interoperable Serendipity. In Solid, multiple apps need to work on the same data. We need also something to translate shapes like cambria lenses (Project Cambria: Translate your data with lenses). Developed a library called Soukai that use shapes. I realized it’s not so difficult to support different shapes. I’m not against using RDF, but what I’m doing is in JS. They’re not using SHACL or Shex. But I have an issue open, and I don’t think it would be difficult for the library to parse these shape formats. Nobody asked for it, but I’m willing to take a look if requested.
    • Jackson: I prefer SHACL and Shex before it’s a programming language-agnostic format.
    • Jesse: […lenses] I imagine world with not just shapes, but also alignment rules. We can have both shape and mapping to other shapes.
  • Athanasios: I have been following the developments around Solid for some years now(no hands on experience). Yesterday I heard about the consept of “lenses” and this sounded interesting I joined to hear some more about this (not sure if this is the correct call :)) - thanks for the explanation!
  • Sébastian: in ActivityPods we are implementing fully SAI at the moment. We have few applications that have to be interoperable. I looked at what Jackson and Jesse alredy have done. There is the question of ShEx vs. SHACL. As well as ShapeTrees. For the requirements, how are we going to categorize and order the shapes. I started with having the same name for shapes and types but shapes can have multiple classes. How are we going to find ourseleves in this possible mess of many shapes?
  • Laurin: I’m working on ActivityPods, DX is a problem and having untyped objects is a problem. LDO will be a big improvement. Link transveral is also important.
  • Michal: I’m interested in shapes from app dev perspective. I’m using LDO and it helps. I’m interested in how shapes are created, I like the idea of crowdsourcing. Not only which shapees are used for which data but also which applications are using which shapes.
    • Jackson: Release coming in March. graphql-based link traversal on RDF data. You define the structure (e.g. using shex or something like graphql), the engine will dereference URIs if necessary.
  • Erich: Primary interested in medical domain. It would be nice to post a need for a shape.
  • Arne Hassel: Solid developer, Interested in shapes as a way to tie data together with applications (e.g. facilitate app stores), to create a better user experience. I’ve used LDO quite a lot, and enjoy the use of shapes there as well.
  • Joshua: Mostly here to listen. We need to be careful about complexity, around content negotiation. Learning solid is hard, we need to abstract it away.

What are the requirements for a common shape repo ?

ShEx + SHACL + SHACL-Compact negotiation ?

  • Sébastien: ShEx is not compatible with SHACL, unfortunately.
  • Jesse: Shacl and shex ahv been competing for a long time. SHACL is W3C standard, and within Solid we should push for one standard, and it should be SHACL, since it’s a standard. ATM there is Data Shapes working group. They want to update SHACL to rdf 1.2. As part of that work, we take SHACL-Compact syntax, and upgrade it from CG draft to WG specification. And it should improve usability thanks to ergonomics […]
  • eP: Another point required for shape repo: It’s also about authorization and discovery of shape data. People have mess in filesystems. Imagine many file system of different apps, people, coworkers. Having different file systems can be unmanagable. For example all the calendars… […] We should keep authorization and discovery of data in Shex/SHACL convesations.
  • Jackson: I don’t have a problem with using SHACL instead of ShEx.
  • eP: I agree with Jessee that SHACL (compact) should be the way to go. Though: Most developers shouldn’t be required to write shapes themselves. There should be a curation process and assistance with creating quality shapes.
  • Sébastien: It’s already hard to agree on an ontology. Shapes are even more difficult; some applications might enforce certain properties (e.g. profile picture) - this might kill interop.
  • Joshua: What are the advantages of this “new world”? […] All shapes should have a version, later / extended versions maintain compatibility with older versions. Defining “base” shapes as a minimal common ground on top extension can be developed (example: passport where Indian passports have multiple names western passports don’t have).
  • Jesse: There is probably going to be a read-write incompatibility. An application signals what it writes/outputs which is more than what is required for the application to process foreign data. […]
  • eP: Possibly some aspects of shape trees can be used.
  • Jackson: When building a shape repo, we submitted NLNet grant with Jesse, Jeff and we got feedback that we want to have people in the community who use the shapes. In practice, we want to have community, business, and application builders involved. If somebody wants to become a stakeholder, contact us.
  • Sébastien: It seems there is consensus on SHACL (with the hope that shapes will be easier to write with the Compact version)

Should we include shape trees in such a repository ?

  • Sébastien: Was skeptical of shape trees at first, but you can also use with a flat hierarchy. With shape trees, you can specify your requirements when sharing data. Should shape trees be part of the same repository?
  • eP: Related [UC] Context aware access policies · Issue #17 · w3c/lws-ucs · GitHub For me it’s important that we can build authorization screens. We also need shapes internationalized. Application could lie about the labels. URLs also don’t have sense for readability. It needs to be curated, including internationalized descriptions and labels. I don’t mind format. In SAI we take data from shape trees. Human labels need to come from trusted source.
  • Sébastien: With shapetrees also come shape tree descriptions, that would also need to be stored in this common repo.
  • Jackson: Valuable thing about Shape trees: here are things that I’m giving access to. There was a controversial blog post: is Solid a knowledge graph, or file system? I lean towards amorphous knowledge graph. My concern is that shape trees lean more in direction of collection of documents. The part which says: these shapes are authorized for these things, is valuable from my POV.
  • Jesse: The view of this as a knowledge graph is nice but it can slow down adoption/understanding for newcomers. We are thinking of Solid as file system. What can emerge over time is ways to access that data: query endpoints etc that would expose the knowledge graph, but we can’t build with amorphous knowledge graph, we need to use APIs that already exist.
  • eP: Taking a pragmatic approach in SAI. I don’t assume that you have a folder-like structure. You should be able to create collections of certain data types / shapes for a given use case. Collections based on relations. I think a middle ground is possible.
  • Joshua: I find both are correct but it depends on your problem perspective. The graph is abstract and from the flow perspective, you will have an entrypoint.

Easy to deploy locally, so that developers don’t need to be online to build applications and so that they can test new shapes before submitting them ?

  • Sébastien: When you build a new application with new shapes, it’s important to deploy them locally to get started.
  • eP: if we rely on reference to shape/shape tree, would it be sufficient to have env variable - use this shape URL, and for production different one? So in different environments you could easily swap root of that repo.
  • Sébastien: I was thinking of a local server hosting the shapes.
  • Noel: One problem can be: if I have one variable locally, it’s a problem if it works different in production. I think it goes beyond local environment. It’s a question fo how much strict we want to be. E.g. I use https for schema.org, and prefix.cc uses http. My app didn’t work then. I think this should be same in dev and production.
  • eP: if you use different shapes in dev and prod, you want to easily switch from dev to prod once your shape becomes curated and official. We need to keep in mind that there will be a transition from dev to canonical URL.
  • Sebastien: For the ActivityPods shape repo, you can host the shapes locally and once they are ready to go online, you can create a PR directly. No risk of losing data.

Clear and easy process to add and deploy new shapes (possibly with GitHub PRs) ?

  • Jackson: we discussed this when applying to NLNet. They were weary of GitHub, which is not so FOSS. Building our own source control, or FOSS solutions would mean having funding resources for this.
  • eP: There should be multiple people working on curation. While it should be easy to add shapes or update, you shouldn’t get this set up overnight. I believe schema.org has a process for adding things
  • Jackson: I disagree. Building a set of tools that encourages to use shapes that exist while this does not have to be centralized. Prefer to build converters and restrict the tooling provided for the whole process to be as restricting that converters can do their job. This is going to important to improve devX because people don’t want to get started first.
  • eP: Please think about authorization – how do you manage discovery and how do you prevent malicious application (with shape descriptions that are wrong). UX must be good.
  • Michal: I want to make the application, and not be blocked. It’s especially relevant when exploring a new domain.
  • Sébastien: One resource can have multiple classes, how do we manage that? For example Docker has official images, should we push to have one shape for every classes in ontologies ? Or have something more flexible, that uses a mix of different ontologies on the same (kind of) data, and still have them compatible ? With ontologies there is no “required” predicate. But with shape trees this constraint may be there, and it is what makes shapes more interesting than simple ontologies. But we may have too many constraints and not be able to develop. So applications which don’t conform won’t be able to read/write.
  • Noel: We can have a look at TypeScript. A lot of people have worked on this. You can have type definitions, but also module augmentation. So library may include types. There is repo DefinitelyTyped. We were joking we should have DefinitelyShaped. We can look at TypeScript for inspiration.
  • eP: Curation will require funding. Hopefully ODI can have a few people working on that part-time, grants of NLNet, but there also needs to be something long-term. At some point we’ll have competing shapes, different proposals, … and hopefully we can have them converging. Regarding required/not-required: We shouldn’t have closed shapes, i.e. support exension. With shape trees you can do that. Different deployments can have different requirements.
  • Jesse: Shape extensions & TS : There is the concept of shape-conjunction that is similar to type extension.
  • eP: Most developers don’t want to inovate the shapes. If there are good shapes, they will use them. That’s why schema.org is more popular than linked open vocabularies

Frontend similar to ShapeRepo.com to explore/find existing data ?

  • Michal: This is the concept of crowdsourcing. It would be good to see the shapes that are used the most. Neither bottom-up nor top-down are right or wrong.
  • Sébastien: In a crow-sourced environment, you could have a voting/ranking which shapes are used the most.
  • Jackson: How much effort do we put in that ? Optimally: Host shapes on POD and which provides the tools and features discussed earlier. Then get this published to npm etc. We need tooling for that and I’d volunteer to work on that. Dogfooding with Solid - the data may be stored in Solid Pods. I think shaperepo.com is a good domain name and I’d be willing to provide it to org doing this effort.
  • Jesse: There is a middle ground. Use github/gitlab as a way to do crowd-sourced PRs to discuss things there. Then publish shapes from that repository when agreed upon.
  • Sébastien: LOV can be inspiration for us. There are links and visualizations…
  • eP: I think we may want to have integration with app directory. With shape discover app, and vice versa. With SAI you can already see what shape the app is using.

Automatic validation of shapes that are added ?

  • Sébastien: Skip this topic for now. The idea is that we need some kind of (automatic) validation if the shapes are correct. That will be better than human validation.

Clear categorization ?

  • eP: It connects to shape references in shape trees. You can start having a structure oh how they relate.

Next steps

  • Noel: Find real use cases, that are doing similar things but are not interoperable, and make them interoperable. So first step could be finding non-interoperable apps and make them interoperable.
    • Sebastien: I have ideas for apps that would share similar data, just lacking funds.
  • Jesse: SolidOS is another place where we have things built, and there is desire for better UX. Sebastien, if you have a list of apps with interoperability, why don’t we start in Solid namespace list of all applications we want to covert to shape declarations, then get people who want to do the work and get resources for them.
  • eP: We should explore both: data migration to common shape; or come up with a system of “lenses” to translate between shapes
  • Jackson: It would be good to have more projects like LDO, maybe in other languages. Also, I’ll be working on developing tooling for publishing shapes to npm.
  • Laurin: We had a call with University of Vienna person who would be interested in proposing theses on this topic, and this may be an option for getting resources in the effort.
  • Sébastien: With ActivityPods, we will have to get going with our own shape repository because we need it now. We will refactor it a bit to use SHACL.
    • eP: Can we come up with lowest common demoninator-shapes?
    • Sébastien: It can get difficult when you have many required fields (e.g. when having application-specific events)
  • Noel: It’s also an issue to decide how the resources are stored: one resource per document, or many resources per document? For example, for recipes I would have one document per recipe, but for a chat it’s common to have all messages in the same document.
    • eP: You need to split shapes in multiple resources where applicable
  • Sébastien: We could make an NLNet application but we will probably need to discuss more and we will need to be clear who is going to implement it (the reason why the application failed)
2 Likes

Hello everyone,
I totally forgot about this meeting, but I am interested in the subject of Shape repos.
Anyway, You didn’t need me there, as I see that all the right people were present.
I read through the minutes and I agree with everything you said, and encourage you to continue in that direction.
github for storing shapes is probably not a good idea (as NLnet already said) but gitlab, gitea, codeberg are easy to setup if we need self hosted git.
more specialized website like shaperepo.com and taking inspiration from LOV and schema.org is a good way to go too.
We also need to think about a place to discuss each Shape, within the community dedicated to each domain of application.
We can bring app developers from many different background and domain, to collaborate and decide on Shapes, but we need some organized spaces for that.
I heard that Solid Community group on matrix would be repurposed for the goal of defining ClientClient specs, or DomainShape specs, or whatever they are called.
But I hope we will not use matrix for that.
A structured way to have discussions, by domain, would be the best.
I like the idea of the Lenses too that could convert between shapes or between versions and forks of the same Shape.
I will try to join next call, if there is one.

2 Likes

maybe you can find some interesting links here: GitHub - ozekik/awesome-ontology: A curated list of ontology things

1 Like

I didn’t comment on this during the call because I didn’t want to go on a tangent, but I do think we should use GitHub :sweat_smile:. If nothing else, because that’s where most people are. Like Cory Doctorow says:

If you’re going to devote yourself to solving the collective action problem to make people-power work against corporations, spend your precious time wisely. As Zephyr Teachout writes in Break 'Em Up, don’t miss the protest march outside the Amazon warehouse because you spent two hours driving around looking for an independent stationery so you could buy the markers and cardboard to make your anti-Amazon sign without shopping on Amazon

Having said that, though, luckily for us git is decentralized :D. So we can totally have the “main” repository in gitlab or wherever, and keep a mirror in GitHub for people to submit issues, PRs or whatever.

On that subject, I appreciate the nice quote from Doctorow.

When you say “Github […] that’s where most people are” you are talking I guess about user account at github. That’s true. Among developers, I think it is safe to say 100% have a github account.

So, the idea in general, and it seems many FOSS people do it this way, is to have your primary git hosted by a decentralized/self-hosted/open-source git facility (Gitlab, Gitea, Codeberg, etc…) and keep a mirror in GitHub as you said, for convenience, and extra backup.
But all the metadata around the git, including the issues, discussions, project management, etc, should be in the primary decentralized git server, and not in GitHub. This is the whole point of moving away from Microsoft, because otherwise, those metadata or “peridata” are stored in proprietary formats, and Microsoft controls them fully.

It is very easy to login via OIDC with a GitHub account, into any alternative OS git server (Gitea, Gitlab, Codeberg), and that’s the way to go. You don’t burden your users with managing another password, they can log in with their GitHub account. But all the data is safe in an Open Source and decentralized system.

Specially in our case, if the issues and discussions are going to be used by the communities to take decisions on common Shapes, then we want to have control over that data.

It is easy to install those software and I am sure we won’t be late to the ShapeRepo party, even if we have to setup a Gitea/Gitlab/Codeberg on our way.

Personally, I like Gitea, as it is more lightweight than Gitlab, and the look&feel is closer to GitHub while Codeberg’s GUIs seemed a bit rebutting… But any of those would do. For NextGraph i have a self-hosted Gitea, with OIDC from GitHub, and I mirror the git repos to GitHub, but the issues, discussions and project management in GitHub are deactivated.

Maybe one day we will come up with a good Ontology and Shape for Issues and Task/Kanban and then we can fork Gitea to be using a Solid POD for storing all those “peridata” :wink:

1 Like

While we are at it, I thought it could be interesting to extend the scope of this community group and work on Shapes, by including the ActivityPub/Fediverse dimension.
We know that many apps are in fact Social apps, or have a social component in them.
Thanks to the efforts of ActivityPods and @srosset81 Sebastien to bring ActivityPub to Solid, we are getting closer to the fusion of the 2 worlds. Also, as a reminder, Solid means Social Linked Data… I am sure everybody remembers that.
So… while investigating the shapes, ontologies and schemas out there, we could have a look at what the Fediverse is doing too.
2 generic links :

And in the domain of Events, by example, which is of interest to us I am sure, there has been some previous work from several projects in the fediverse, and the event-federation project is trying to gather those formats. Event Bridge for ActivityPub 1.0.0 – Event Federation
I think for many domains it would be interesting to have the fediverse people involved too.
I know it is more work, but if we want to come up with Shapes and Ontologies that are going to be used widely, maybe this is a good idea?