Aligning efforts in LD schema / ontology design + adoption

aschrijver · June 4, 2020, 5:57pm

When it comes to what is out there on the web in terms of linked data, one can only conclude that it is a big sprawling mess. It is really hard to get good ontologies together in an application, and - after doing so - avoid creating your own unique interpretation of the semantics within.

This problem applies to the entire Solid + linked data world. For linked data to be widely adopted we need to be on the same page as much as possible, not only in terms of spec standardization, but especially on semantic models / ontologies being adopted.

The problem we need to overcome is well-explained by TerminusDB team member Kevin Feeney in this Medium article:

The [linked data] 4 basic principles were:

Use URIs for things

Use HTTP URIs

Make these HTTP URIs dereferencable, returning useful information about the thing referred to

Include links to other URIs to allow discovery of more things.

We can supplement these 4 principles with a fifth, which was originally defined as a ‘best practice’ but which effectively became a core principle:

“People should use terms from well-known RDF vocabularies such as FOAF, SIOC, SKOS, DOAP, vCard, Dublin Core to make it easier for client applications to process Linked Data”

[…]

However, the big problem is that the well-known ontologies and vocabularies such as foaf and dublin-core that have been reused, cannot really be used as libraries in such a manner. They lack precise and correct definitions and they are full of errors and mutual inconsistencies [1] and they themselves use terms from other ontologies — creating huge and unwieldy dependency trees. If, for example, we want to use the foaf ontology as a library, we need to also include several dozen dependant libraries, some of which no longer exist. So, the linked data approach, in fact, just uses these terms as untyped tags — there is no clear and usable definition of what any of these terms actually mean — people just bung in whatever they want — creating a situation where there are effectively no reliable semantics to any of these terms.

I was in a vidcall with @pukkamustard the other day - we share a common interest to offer LD-based knowledge to local communities - about the need for a new initiative that gives a modern and fresh approach on collecting schemas / ontologies for practical application in software designs, rather than the academic data research contexts in which you normally find these things. Maybe this should be a new wikimedia project, or something similar, a big pattern library maybe, idk.

Regarding streamlined app creation I am interested in exploring a DDD + Linked Data approach on which I just posted in TerminusDB community forum:

So I’m interested in looking into combining Domain Driven Design + Linked Data for the fediverse apps I’m elaborating on. This DDD + LD approach is a bit odd, and there is hardly any information in-the-wild on the combination of these two fields. Usually LD brings you to more academic data science sections of the web, while DDD leads to more of enterprise business applications.

This combo is interesting, I think, in order to make rich semantic models available to the masses in well-designed (clean architecture) applications. Eventsourcing, CQRS and DDD has gotten better tool, framework and library support to the extent that it is now within easy reach for a large part of the developer community. Many large, production ready projects use ES, CQRS and DDD is now following along.

Curious what your thoughts are about this subject area…

anon36056958 · June 4, 2020, 10:45pm

Back in the day, there was this idea

but somehow it never happened…

The article you mentioned was right to point out “academic research in which almost all resources are focused on novelty and almost none on infrastructure and maintenance”

anon36056958 · June 4, 2020, 11:00pm

At that time (Nov. '18), a lot of the Solid client infrastucture that now exists was not there.

aschrijver · June 5, 2020, 4:51am

There was a lot of interest at the time, @anon36056958, I see. It is a hard problem to tackle.

On the one hand I would like to have central entrypoints to dig down into what is available, but on the other hand I don’t like centralized initiatives (in the related topic CDN like Cloudflare are mentioned, but these are (becoming) monopolists and/or trackers). There is @tuelsch experiment with search in WhatTheOntology as an option, and mention of decentralized approaches.

Tangential…

I am interested in terminusdb.com and they support OWL to define database schema’s, which are saved as RDF, can be edited as turtle and queried as JSON-LD.

They recognize the problems of OWL (which led to its failure in adoption) in Graph Fundamentals Part 3: Graph Schema Languages:

What really killed OWL was the impracticality and idealism of the academics. They wanted a language that was capable of usefully describing an ‘open world’

[…]

Open world reasoning such as this is a very interesting and commendable — and sometimes highly useful — field. However, if I have a RDF graph of my own and I want to control and reason about its contents and structure in isolation from whatever else is out there, this is a decidedly closed world problem. It turns out that it is essentially impossible to do this through open world reasoning. If my database refers to a record that does not exist in my database (i.e a breach of referential integrity) then it does not matter whatever else exists in the world, that reference is wrong and I want to know about it. I most especially do not want my reasoning engine to decide that this record could exist somewhere else in the world — if it is not in my database it does not exist and I know this. If I cannot manage my own database and make sure that it is not full of errors, then it does not matter what else exists in the world because my house is built on a pile of mud. I can’t even control what’s in my own bloody database that I control entirely, who am I kidding that I can start reasoning about the universe beyond.

They are not just bashing the efforts that went into all the standardization activity, but take a more down-to-earth approach:

Nevertheless, it is important to recognise that, hidden in all the nonsense, there are some exceptionally good ideas — triples, URL identifiers and OWL itself are all tremendously good ideas in essence and nothing else out there comes close.

From @luke on terminus forum:

For schema design, TerminusDB uses the OWL language with two modifications to make it suitable as a schema language. Namely, we dispense with the open world interpretation and insist on the unique name assumption. This provides us with a rich modelling language which can provide constraints on the allowable shapes in the graph. We really support OWL as it is logically expressive.

So if I adopt TerminusDB I’ll be using Protege (probably) and create my own OWL schema’s as the basis for application data models, and it is at this level I would be interested to see what others are using / what are standard constructs.

In that regard VOWL as mentioned by @pheyvaer looks interesting as a way to visualize, and the (javascript) projects are still maintained (though visualdataweb.org itself isn’t). I like the clarity of the visualization, instead of things that look like this.

Web-VOWL

LD-VOWL

anon36056958 · June 5, 2020, 6:18am

But this is very important I think if your database is the real world. If you want to reason about poverty and why there is poverty, then you can’t control the database. A database on that subject under someone’s control would be less useful. The open world assumption is very important because it is an open world. It’s a muddy messy shared world.
But there are many cases where you do need to deal with pieces of it as a closed world under your control and that’s where data shapes come in. They allow you to work in isolation from the muddy mess when you need to.

aschrijver · June 5, 2020, 6:37am

Do you have a good low-barrier intro on Data Shapes and how it fits into this concept? Very interested. The closed world 2 open world transition process is very important eventually.

But for me what is first and foremost priority is that I have my own application domains under control while elaborating functionality, start with minimum viable product and build out from there. When I mentioned ‘be on the same page as much as possible’ that last bit is important to keep things practical. The muddy, messy world ontologies are muddy and messy in their interpretations, and I (and I presume others too) don’t want to be stuck in analysis paralysis or utopic visions of interoperability that are many years away from any form of realization.

anon36056958 · June 5, 2020, 7:03am

This is a great book and it’s online.

https://book.validatingrdf.com/

Chapter 1 is a great intro. I will look for more.

For analysis paralysis, yeah I hear you. Building shared things takes longer but it’s worth it.

anon36056958 · June 5, 2020, 9:24am

This is a gold mine of insights: Design Issues for the Web
https://www.w3.org/DesignIssues/

and from in there you might find this interesting: Cultures and Boundaries
https://www.w3.org/DesignIssues/Culture.html

Here are some links for shapes:

Shape Expressions Primer
https://shex.io/shex-primer/

Shape Expression Vocabulary
https://www.w3.org/ns/shex#

ShapeMap Structure and Language
http://shex.io/shape-map/

RDFShape: RDF Playground
https://shaclex.herokuapp.com/

aschrijver · June 5, 2020, 9:53am

Very nice, thank you @anon36056958! We were about to cross-post… here’s what I was preparing:

As a very good follow-up to the TerminusDB Linked Data article referenced above I highly recommend reading @RubenVerborgh’s paper:

Ultimately, all of above indicates a need to guard ourselves from conducting research in a vacuum. Not all science requires practical purposes, but many of the research problems we study will never actually occur if the Semantic Web does not take off any further, so we should at least consider—for our own sake—prioritizing those urgent problems that are blockers to its adoption. […]

Converting technological research into digestible chunks for developers is considered trivial and outside of our scientific duty […] Yet everything that reeks of engineering is shunned. However, most researchers in our community have not built a single Semantic Web app, so we cannot pretend to understand the insides of the 20% [where practical application occurs]. […]

Not only do many of us lack Semantic Web experience as app developers, our even bigger gap is experience as users. […]

[In conclusion] Turns out that the engineers and developers have moved on and are creating their own solutions, bypassing many of the lessons we already learned, because we stubbornly refused to acknowledge the amount of research needed to turn our theories into practice. As we were not ready for the Web, more pragmatic people started taking over.

So referring back to the need for a ‘fresh and modern inititiative’ I think at least some criteria should be:

Targeted towards practical real-world application
Developer-friendly, easily grasped and adopted (without an entire semantic web as precondition)
User-oriented, related to real-world domain models (and preferably bounded contexts thereof)

happybeing · June 5, 2020, 10:28am

Not a Solid app, but Linked Open Vocabularies looks impressive. I plan to use is so at some point to help understand what is in datasets based on the ontologies they use:

anon36056958 · June 5, 2020, 11:26am

It looks like they use java and sparql to put ontologies in a mongo database and then have an api to return terms and persons or organizations in the ontologies. Looks like there are 716 ontologies. The code is from 4 years ago so I don’t know how often or when they update the database, or if they look for new ontologies or if the list is never changed.

It would be great to have it updated in an adaptable way and the vocabs themselves be in a triple store on a pod or in SemApps. A dynamic cache of ontologies (or shapes or forms) that you could analyze or update in custom ways, and work on collaboratively.

aschrijver · June 5, 2020, 11:51am

Also a nice collection w3id.org (see the repo for all redirections)

happybeing · June 5, 2020, 12:23pm

They are responsive on twitter so you could ask.

anon36056958 · June 5, 2020, 2:25pm

Do you mean the creators (Pierre-Yves Vandenbussche and Bernard Vatant) or where it’s hosted (at UPM), or is there a Twitter account for LOV? I never use Twitter so I’m not very good at searching it.

happybeing · June 5, 2020, 2:53pm

Here you go: https://twitter.com/LOVocabularies

gatemezing · June 5, 2020, 5:08pm

Hello @anon36056958 . Thanks for your email. I am one of the maintaner of LOV with Maria and Pierre-Yves. Nice also to be here

gatemezing · June 5, 2020, 5:11pm

Yes, the code of LOV is almost stable, except for minor features, because the DEVers are also busy elsewhere. However, we add an average 2 or 3 vocabulary per week, after a small review by the curators. How can we help here to have a SolidLoV App? Happy to help if needed.

anon36056958 · June 5, 2020, 5:20pm

Hi @gatemezing!
Happy to see you here! Thanks for your prompt reply! I don’t know if you’re familiar with the Solid project, but it is an effort to redecentralize the web based on Pods, which are personal storage spaces which are sort of like Linked Data Platforms which are privacy protected with Web Access Control settings.
I wrote to see if there is possibly any interest on your part in participating in a possible revival here of the idea of an ontology hub, that is, with an implementation of Linked Open Vocabularies based on Solid.
Or any other efforts or implementations having to do with Solid that you guys may be interested in.

anon36056958 · June 5, 2020, 5:24pm

That is fantastic! I am hoping also to interest others here, because I think there is a lot of interest in this topic.

anon36056958 · June 5, 2020, 6:03pm

I started a new thread on the topic

continued at Volunteers needed for SolidLoV App

Topic		Replies	Views
Semantic Web & LD Survey by Fabien Gandon	1	554	January 5, 2019
Common Core Ontology with Linked Data	31	4465	October 2, 2023
Integrating with non-SOLID Linked Data	2	451	February 1, 2020
How to define rdf for interchangeability between apps Solid App Development FAQs	6	1008	November 5, 2018
On datashapes as a complement to vocab Solid App Development FAQs	2	931	February 27, 2019

Aligning efforts in LD schema / ontology design + adoption

Web-VOWL

LD-VOWL

Related Topics