Ways to represent relationships in RDF


#1

Copied from Solid chat:

Happy new year everyone. I’m starting it with a question about RDF so may the twenties be the decade of Solid and the Semantic Web! :smile:

I’m looking at how to represent relationships (links between entities) and provide information about them. I come with some preconceptions having helped create tools for analysis and visualisation during investigations, and am wondering how this translates into RDF.

One of the things I am struggling with is whether a link is a first class ‘thing’ with its own identity and properties. It doesn’t have to be, you can just have person:A friendOf person:B, but it can be useful to be able to say things about that relationship, such as the source of that knowledge, or when when the relationship began etc., and even to give the relationship an identity of its own so that it can itself be the object of a relationship.

Does anyone have thoughts about the latter: using a relationship as the ‘object’ in a predicate? Are there ontologies that cater explicitly for this?

Another question then is what are the ‘expected’ ways of using RDF to represent information about relationships?

I see that reification (https://www.w3.org/TR/rdf-primer/#reification) could be used to handle most or all of the things I’m likely to need. Is that right? Are there alternatives that people use and if so why?

For example, a ‘bag container’ can be used to express a group relationship (as here https://www.w3.org/TR/rdf-primer/#figure14).

Or a container could be used to express a composite relationship to a single ‘subject’ similar to using reification, with the container being given properties of the relationship instead of reifying the relationship predicate. I think it best to use reification in this case, but it concerns me that there are multiple ways to handle this same requirement and it’s not obvious from the RDF Primer what’s best and why, so I’m looking for guidance from anyone who is used to handling entity-relationship data about the real world (i.e. people, places, organisations), and any considerations for particular use cases.

As my interest is visualisation, my first thoughts are about how existing RDF visualisation tools treat reification of relational predicates and other ways of expressing relationships. I suspect the simplest visual representation might not be the best, but I’ve not looked into this yet.

Ref: https://gitter.im/solid/chat?at=5e0ca068eac8d1511e94614d


#2

Have you looked at this https://www.w3.org/TR/activitystreams-vocabulary/#connections


#3

Thanks, this is exactly the kind of example I’m looking for, even though it is confusing me :wink:. It says the examples are JSON, not JSON-LD although they look a bit like JSON-LD to this untutored eye, so that isn’t helping.

For example, I’m not sure the model is ‘good enough’ (or general enough) as in Example 145 the subject / object and indeed the relationship in those examples don’t seem to have any identity (URI) but this might be my lack of understanding or JSON-LD (assuming it is JSON-LD!). Maybe it is inferred in some way?

Thanks for the pointer though. Very helpful.


#4

Yeah I think here there is one more level of abstraction :
Example 146 describe the activity ‘Create’ of a relationship of type foaf, between Sally & Matt with a startdate. This way you can add every property to the relationship. This relationship doesn’t need to be named, it could be a blank node.


#5

My understanding is that what you describe has existed in the entire history of these techs.

There is recent activity around this:

Definitely view slides 14, 15, 16.

“Reification” of one model / representation within other models / representations (relational, rdf, property graph, etc).


#6

I’m no expert on SKOS, but maybe it would be useful. It defines relations that are themselves SKOS Concepts and can grouped in collections and also used with a reasoner. Relations can broaden or narrow other relations and can be transitive or not.


#7

@markjspivey you nailed it. That’s fascinating and quite an indictment of RDF.

My question is well answered by that one presentation: RDF does not have an acceptable way to model relationships, but it’s a known problem and people are working to address it, with RDF* / SPARQL* for example as noted by @aveltens in his reply below and explained here.

Slide 17 suggests that property graphs were developed to support more compact and useful representations “in reaction” to the complexities and limitations of RDF (such as in relationship modelling) but I can say that I began working on this kind of modelling in 1994 and that we created a property graph model because it is just the best way to model and work with this. It was nothing to do with RDF obviously, so how did RDF arrive later without someone understanding this.

I’m shocked that those creating RDF missed this. There were quite a few companies doing this kind of work up to the end of the millennium and it seems many converged on a property graph approach - and there must have been plenty of earlier work because it’s such a general modelling problem.

What we did was take the graph modelling and visualisation to the desktop UI and apply it to particular industry and applications. I wonder what drove RDF towards an incompatible solution.

When XML arrived in the 1990’s I hoped it would help create a universal data modelling representation, but were disappointed with take up. Like RDF it proved difficult to work with and adopt, and for graphs it also similar difficulties due to the hierarchical representation which makes it awkward to represent and work with graphs in XML.

The RDF Primer & Turtle Primer describe how reification can be used to provide limited metadata for a triple (such as who created it) but it is limited and very unsatisfactory.

The Primers are hard to understand on this topic, but make it clearer that there’s no way to identify instances of a triple, which means the usefulness of reification is moot. You can model this, but only using assumptions or conventions that are outside RDF, making them application specific.

The following article elaborates this (and various differences between RDF and Property Graphs), and helps understand the Primers as well as showing how awkward it is to make use of the different workarounds that it describes (eg to update our query the metadata):


#8

I recently discovered RDF* (RDF star) which is exactly about what you describe

http://blog.liu.se/olafhartig/2019/01/10/position-statement-rdf-star-and-sparql-star/

RDF* allows us to represent both the data and the metadata by using a nested triple as follows <<bob foaf:age 23>> ex:certainty 0.9 .


#9

I wonder if acl’s and webId’s could be used for individual triples that way.


#10

Currently, PG provides an effective, compact, flexible and user-friendly approach to represent knowledge, especially considering the graph database solutions such as Neo4j.

However, there are at least two factors we cannot ignore regarding to the strength of RDF family.

1. Short/Mid-Term Compatibility

When we are talking about machine-friendly “data/knowledge”, it’s not only about what hosted in the databases in the backend, but also about what expressed through massive webpages in the frontend.

Just think about how do we (human) communicate and retrieve knowledge: In most of time it’s based on what we expressed and perceived (orally, writtenly or other forms of symbols and signals) rather than reading directly from sources’ minds. In most cases, it’s impractical and unreasonable to ask the sources to unreservedly share whatever in mind, considering privacy, business goods, emotions, and all other social facts.

The same principle works for Internet world. HTML/JSON frontend is hands down the major channel that the agents willing to share with the audiences. RDBMS/PG has higher efficiency, technically more friendly to programmers, that’s true. But we cannot expect agents to share their “deep mind (database)” via SQL/Cypher for above mentioned sociological reasons. The common language on the table, the biggest public shareable gold mine of knowledge is and will long-termly be residing in the presentation layer of web. Yes, because it’s the presentation.

RDF family provides a more formal and rational syntax for this common language. The audiences and their artificial assistants can significantly reduce their cost of retrieving, understanding and learning, just like what many SEO practices do today.

2. Long-Term Openness
The presentation @markjspivey also mentioned the mutual-convertibility between RDF(with reification) and PG. So I’m not sure at which point there’s “incompatibility”? why “there’s no way to identify instances of a triple”? Would you please elaborate?

Let me assume they are mutual-convertible, no hard limitation one over the other. It’s not the first time for such tech transformation in internet world. The one we are witnessing is the transformation from native mobile applications to web-hybrid mobile applications. One decade ago, we started ObjC for iOS and Java for Android, backed by Apple and Google respectively. PhoneGap was soon introduced to support JS logic and HTML presentation, however was widely criticized for its complexity and poor performance. What happened then?

The hardware was upgrading all the time. It gradually shortened the performance gap between iOS and Android, while also gradually shortened the performance gap between web-hybrid and natives. When the gap is smaller than what sufficient amount of people would concern, the hybrid applications thrived.

Meanwhile, the frameworks Angular, React and Vue were also iterated to make the development more and more convenient and friendly. Today, more and more projects are tend to employ the web-hybrid solutions. Although the giants are still investing huge in new mobile app solutions like Swift and Kotlin annually.

Back to RDF/PG. Of course we see a lot complexity and performance issue today. But with above mentioned reasons, we can still foresee its strength in its own ground.


#11

@dprat0821 Thanks for your comments. Forgive me if I make any mistakes as RDF is still new and complex for me, and I’ve always found understanding w3c specs hard.

As I read the reification documentation (RDF Primer) you can’t have multiple instances of a relationship triple - it is defined by the values of s/p/o so if these are identical, no matter how many times you reify this you are referring to the same triple. This seems to be confirmed in RDF Triple Stores vs. Labeled Property Graphs: What’s the Difference? which helps understand various differences.

In addition, even though there may be ways to represent or add properties to a relationship these require external assumptions (noted in the section on reification) which means they are application specific, which undermines the universality the RDF seeks to provide. This also creates an overhead in terms of implementing creation, querying and editing of those structures, not just per application but also because RDF and its tools are not set up to handle them (so querying for example becomes much more complex to achieve).

I think those are the two main problems: only single instances of a relationship, and application specific implementation of relationship properties.

As noted RDF* may help but I can’t judge that myself.