Is RDF "hard"?

Is RDF “hard”?

Apologies ahead for a somewhat long post, I promise I have a point to this. I’m certainly not looking to create any wars over this, I’m just trying to understand how it all works out.

I’ve started up on this after mostly finishing a recreation of the To Do Solid example application.

When I was working my way through the walkthrough and just following along on the Solid start up pages on vocabularies, it took a lot to try and gain an understanding of RDF in general.

While trying to get a better understanding of how it all works, I found this site

Which seems a bit dated, but I thought the explanation was useful. (Also asking questions on Reddit, which also helped.)

Basically, for me, whenever I want to develop an application, I generally believe that I just want to write an application as quickly as possible. I’m used to kind of the mental pattern of “File => New Project” (or if you’re a command line person, “framework new foobar” i.e “cargo new project” for Rust or “npm install create-new-project init newprojectname” for npm) and just start writing things out. My data often just takes the shape of some classes that I’ve created, and I’m just interested in storing that data “somewhere”, traditionally for me in a SQL database.

I personally believe that kind of attitude is how we’ve arrived at where we are in the world today in application development: the rise of Object Relational Mappers (ORM) and various NoSQL implementations. Not that there aren’t valid reasons for those technologies; it’s just that in my view that often one of the factors in creating such tools is to try and reduce the development impedence of “figuring out data” when you’re writing an application. I don’t want to sit down and figure out how to design a SQL database for my app, I just want to write my app and let it use “data.”

Which brings me to RDF and the various walkthroughs I’ve used in the last few weeks in trying to understand how to use Solid. The walkthroughs for the most part seem to me that it forces the developer to think about data up front. Not only thinking about it up front, but really thinking about it.

When I want to create an app to manage To Do items, I now have to sit down and figure out if someone has already done this, and find a vocabulary for it. Then when building out an RDF document, I have to think about it in a different shape: Whereas I’m used to thinking about things in objects and loosely in relational database, I now have to think about things in triples, which is one of those foundational concepts you really have to understand first before you get anywhere. (This last item isn’t really that big of an emotional consequence for me: to me it’s analogus to understanding how columns and rows work in a table in a SQL database - it’s pretty foundational.)

All of this is hard, at least to me. To be clear, I’m not saying these things are wrong; I’m just saying that it’s more work up front mentally.

After I mulled on this for awhile, I emotionally came around to the notion that while this is harder than usual, it’s easier for me to accept if I conclude the following things first before I write any application:

  • My app really isn’t that special. Specifically, the data in my app really isn’t that special. As an example: there are a million different to do apps: Google Keep, Microsoft To Do, etc. This isn’t a new concept, therefore to me the notion of the data in my app isn’t special, and it’s worth my while to go figure out if someone else has already modeled it.

  • I have to care about data portability up front. If I don’t care about data portability, then I’m not going to use RDF, I’ll just continue to store my data in a SQL database, or a CSV, or whatever. This is one of those things where I, as a developer, have to learn to commit to. If I truly do believe that I’m going to be a good citizen of the internet, I should commit to this whenever I’m writing any app.

All of this is to say: how can I make this process easier for developers? Do we need another command line tool that does: “datamagic new data-structure” and it magically figures out how to produce an RDF structure for us to consume in our app?

Or is it simply a function of accepting that there’s going to be a hard requirement of the above notions, and try to expand that notion to the rest of app devs?

Or am I simply just not educated enough?

If you’ve made it this far, thanks, and my apologies.

9 Likes

Being a huge fan of RDF, I say, plow ahead and you will find the immense benefit of it.

OTOH, many developers are, for reasons similar to what you’ve stated, not as excited about RDF as I am. Therefore there are now many Solid libraries that abstract away from RDF some similar to ORM tools many are familiar with. The Inrupt Solid-Client library, LD-Flex, and other tools and libraries can be found here.

As for having to think about data (a separate topic from RDF), yes this requires some effort, but it is an absolutely vital thing that needs changing in the computing world - separation of data and apps is key to user control.

6 Likes

I feel that something very crucial is touched upon here.

Since the early days of the Semantic Web I’ve felt - like other proponents of the technology - that linked data would allow applications to be taken to a new, higher level than was possible before. And I was eagerly awaiting all the exciting uses that would become available over time. Though there’s some very prominent and successful applications since then, like Google Knowledge Graph, OpenGraph and a couple others until now imho the tech has undersold and overpromised. I am still fervently hoping for it to take flight and get more widespread adoption.

One of the problems I think are in the realms of ‘productization’. From the very start the technology has not been easily accessible, not developer-friendly, the true benefits not immediately obvious. Always the answer to questions people like @dynamoRando are asking is something like “If only you take the deep-dive, go through the rabbit hole, you will see the light and the world will be your oyster” :wink:

Most applications I’ve seen to date are firmly in the academic world, or they are for the vast majority fully tech-focused, UX/UI comes later, and product website, good documentation after that.

Linked data / semantic applications should “sell” themselves better to the outside world. That so little of that is happening is imho a very inhibiting factor that slows the evolution of the entire field. When technologists don’t get it, they move own to lower-hanging fruits, greener pastures. When diving into linked data most of what you find stems from the Semantic Web hype that has long past and where link rot is setting in. There’s great developments still - like Solid - but most of the information is at experts level. Insiders of the field. For those that ventured into the rabbit hole before.

This is also the case for the Fediverse where federated app developers treat ActivityStreams / ActivityPub formats as plain JSON and just throw a @context in it at the last moment for good measure and to comply to the spec. With apps evolving like that the Linked Data parts of the story become a harder sell all of the time. And that saddens me, as I believe in its potential.

I’ve said this before on this forum. Much of the Solid Project seems to aim at professional / corporate / commercial adoption first and foremost, and though that strategy might be a good way to come to widespread adoption, my personal impression is that new technologies find adoption fastest if they have a high appeal to the developer community. And that would entail more focus on the community aspects and cross-community collaborations than there exists now (as far as my awareness is).

2 Likes

Prefix: I’m going to conflate two different things here: Solid and RDF/Linked Data, even though I know these are two different things - Solid is a spec implementing many open standards such as RDF/Linked Data. (I also know it’s also a suite of libraries, etc, but I’m simplifying here.)

I agree that there’s likely a gap in selling the idea: both of RDF and Solid. And perhaps I’m wrong on the latter, but in my view a lot of people who come to the Solid project already have the sense that things are “wrong” with the internet and need to be fixed. That’s okay - the problem is trying to sell that notion to the rest of the "developer-verse."

In my view the target audience is the rest of the developers who are writing apps without really thinking about the social impact of what they create. Either you have to convince more of the developers in the world to think about these things up front, or you solve this at at technical level and make it easier such that when they write apps the toolkit they work with have these protections built in, or both. And finally, you have to just continually advertise these ideas and prove them out with demonstrative and impactful products. The goal of the last statement is to make “new” concepts (yes, I know the Semantic Web is not a new idea) a not so abstract idea; that developers get a warm fuzzy feeling of “ok, I see how this works and why I should do this, I can also do this.” Yes it’s marketing, when it all gets boiled down, but that’s to me just the nature of things.

I popped into SolidWorld last month intending to only watch, but when I heard that there’d be an open QA session/working group I decided to stick around. I asked if there was an active DevRev effort being made, and I don’t think the question was really understood. That probably was my fault, I had some technical issues at the time, and I’m not really good at explaining myself sometimes.

As I’m sure people who are on this forum are already aware: good ideas are not automatically implemented. And ideas are easy, production is hard.

I’m game for talking more about this, if others are interested.

Finally, I also want to state that this isn’t intended to be critical to what Inrupt and the Solid project are doing right now. I enjoyed the SolidWorld presentation this month and think those things are really great. I’m just of the view that there’s a really good potential to expand in that area: maybe a “Solid Developers” YouTube channel, a Twitter account, etc. And while I have personal problems with those companies, the point is sustainable and accessible channels for “educational advertisement” of the production and usefulness of Solid/open standards.

EDIT: I guess what I’m saying is there needs to be more product champions.

EDIT EDIT: Sorry, the coffee is kicking in. To me, the question becomes for any developer sitting down at their keyboard when starting a project: Why should I use Solid and develop my app to use Solid principles over what I already know? Using the MEAN stack, or SAFE stack, or any other stack really? In my view, if you can convince the general conversation of: I really should be writing my app in a “Solid” manner most of the time, then we’ve reached our goal. (And by Solid, I’m abstracting here to really mean: leveraging socially aware protocols, such as the Semantic Web, etc.)

2 Likes

It is funny. Just now there’s a cool HN thread The Block Protocol | Hacker News where one of the comments says: “Oh look, someone reinvented semantic web again.” and further on it goes into “Why is RDF a bad thing?” with response:

  • Layers upon layers of complexity: Implementing CURIs alone is a non trivial task, although all that’s really needed to describe entities and attributes is 128bit UUIDs.

  • There is no good build-in way for authentication and trust.

  • Description Logic (the foundation of OWL) has a fundamentally prescriptive philosophy, which makes it inappropriate for most practical applications.

  • No good library and tool support in general, due to the complexity.

  • Blank Nodes

  • No good consistency mechanism for distributed data generation, and Quads (having multiple graphs) don’t properly solve this.

  • Using human readable entity and attr id’s leads to more bike-shedding and accidental collisions than it’s worth.

  • High barriers to entry.

  • After years of developer disappointment the earth is pretty much salted.

Some of these refer to what we discuss here.

Yes, I feel the same as you @dynamoRando and also want to stress there’s no criticism to my post, and mostly an encouragement to delve deeper into aspect that maybe deserve more attention.

1 Like

That is funny, and in my view, encouraging. It seems to me that everyone agrees that we need more interoperability, but we can’t all agree on how to solve it. In that sense, it’s probably because the problem is actually hard.

Solid is actually my first exposure ever to RDF. Prior to this I wasn’t even aware of the Semantic Web. This thread has been helpful in my understanding the existing … pain points around it. It makes me feel better with the notion that there is some friction behind leveraging it; at least to the point that it’s not widely adopted by developers.

I was mulling this over yesterday, and while I don’t have the energy, experience, and bandwidth to do this, I was thinking that what I wanted as a developer to work with RDF is something akin to Github Copilot.

Basically, if I acknowledge up front that the data in my application is likely not unique, then maybe I can have something magical build the RDF for me if I describe it “well enough.”

For example, building an online retail app is not a new problem. There’s customers, there’s products, there’s orders. I don’t want to sit down and figure out the RDF for all of that - I would just like to describe it and have the data model built for me. (With the understanding that I shouldn’t accept the defaults, but rather validate it.)

More specifically, I’d love it if the AI built for me the code base for a repository for my app to work with, and handled the RDF well enough that I didn’t have to figure out all the needed vocabularies, etc.

Kind of a wild idea, anyway.

1 Like

maybe I can have something magical build the RDF for me if I describe it “well enough.”

I don’t know if this qualifies as the kind of magical build you are talking about. But I have a library work-in-progress that lets you generate high level web page components, apps, and websites from declarative RDF. Basically you use SPARQL or several other methods to define the data you want and then pour it into one of many templates. See the demo and the repo.

Ironically, I produce less coding by using RDF itself declaratively where its human readability becomes a big plus.

2 Likes

That’s pretty interesting! Although I was thinking about it from the other direction, i.e. starting with the data itself.

So, for example, let’s say that I wanted to start building the online retailer app that I was describing earlier. Traditionally, I would say, ok, I need to store Customer information, and maybe model a Customer entity and supporting structures (this is for example purposes only) –

class Customer 
{
   string FirstName;
   string LastName;
   Address ShippingAddress;
   Address BillingAddress;
}

class Address
{
   string StreetName;
   string StateOrProvince;
   string PostalCode;
   string CountryName;
}

And then build some supporting data structures behind it in a data store of my choice (in my case, usually SQL) and just keep going from there.

When I say that “the data in my app is not really unique”; I mean just that: the concept of an Address is pretty well understood. So at least to me, it seems like I should easily be able to build this into an RDF document? This way, I can store it in a pod.

Except that… at the moment, I have no idea how to do that. In the Developers section of the Solid project, there are links to various vocabularies and some well known ones. It’s not immediately obvious to me which I should use. I assume I should use vCard for addresses? What about customer? I’ll need to use FOAF for name, right?

And so on. I admit I’m completely lost on this part, and this is the part that I wish to abstract away - I’d love to just feed a magical box my data entities, and have it figure out the supporting RDF for me, or at least take a first stab with recommendations on alternatives.

I’m definitely open to being educated on this. This is the other part I was referencing on: is this confusion natural, or is this just the way it works to get started using linked data? Do we just need to educate more developers on this? And so on.

1 Like

How I would do it. A first stop for common RDF tasks is schema.org. I go there and I look up “customer” and find a number of predicates. I poke around at to see if schema has enough to cover what I need (it often does for common tasks). If something is missing, I go to the LOV ontology search engine and search for customer. I spend an hour or two poking around looking at vocabularies that cover it. All in all I’ve spent half a day or a day researching ontologies and terms which sharpens my understanding of the domain. After I’ve done this for several projects the time doing that is diminished drastically.

3 Likes

Not really a whole lot different from develoing ER diagrams for a database. And one is wise to put a bit of up-front effort into either a database or RDF.

1 Like

As Solid matures, there will be off-the-shelf templates for most common data structures.

1 Like

@jeffz Perfect!

That walk through is the part that I was missing. Oddly, when I was punching in “rdf vocabulary customer” into Google, Schema.org never came up. This is the kind of thing I wanted to be educated on, so thank you for that!

I hope that I too, as I get better booted into this space that the time to compose these things will be reduced. That’s the part I’m trying to understand on if doing this is “hard” - a lack of understanding/education, a lack of supporting technology, or it’s just the nature of things. I agree that if something is going to be well done, it usually takes time.

@anon36056958 - That looks interesting! I’ll have to take a look over that when I get a chance.

1 Like

I think this is the terrain that TerminusDB is exploring towards. From the start they wanted to become the “Github for Linked Data”. I posted about them in this forum before in TerminusDB a delightful database for linked data

Since then they have come a long way, and launched their cloud-based TerminusX product:

Haven’t looked at that service in detail yet. TerminusDB database is open-source, and this obviously not. So it may become a FOMO + network effects de-facto walled garden like Github over time, if successful.

A great project @jeffz, thanks for posting. To what extent does this depend on Solid-specific stuff vs. usable directly on any RDF compatible (lower-level) apps?

Wow, indeed. Thanks @anon36056958

1 Like

The issues you mention are some I’ve been thinking about for a while, so here’s my two cents on the topic :).

First, is RDF “hard”? I don’t think so. In fact, I think RDF is easy. I also came from a background similar to yours (I think), because I didn’t know anything about the Semantic Web and I was used to just creating an app with Laravel. But, it wasn’t too difficult to understand how RDF worked. In particular, after reading the RDF primer and the RDF Schema spec I just saw RDF as a more general way of declaring an object-oriented mental model I already had. All along the way I’ve been learning more, and I realized that some of my initial assumptions were wrong. But overall, I’d say the mental model I got on the first weeks of learning RDF still applies.

But there’s a caveat to that. RDF is not hard; but choosing a vocabulary for your app is hard, and I think that’s where the issues arise. However, if you don’t even try to be interoperable, it’s very easy to create your own vocabulary. You just create a class and properties like you would in a normal object oriented programming language.

If I don’t care about data portability, then I’m not going to use RDF, I’ll just continue to store my data in a SQL database, or a CSV, or whatever.

I don’t agree with this part, because RDF/Solid has an inherent advantage over SQL, CSV, or whatever. In Solid, even if your vocabulary is unique to your app, the data is available to users. So even if you made up a vocabulary and no other app uses it, the community will be able to start using your vocabulary, or implement tools to convert from your data to other formats, without your involvement. Using a traditional architecture, the data will be enclosed in your server and you need to implement a custom API if you want to expose it. I wrote a blog post talking about this, maybe you find it interesting: Interoperable serendipity.

Now, having said that, I would agree that it’s more difficult for developers to get started with Solid/RDF. But I don’t think it’s because it’s more difficult to understand, it’s just that the community is smaller and not many people is focusing on developer experience. I can make a comparison to PHP. I remember, years ago, that there was a joke running around that PHP was dead and everybody hated working with it. But then, Laravel came around and now a lot of people love PHP. Sure, the joke is still going around, but Laravel has a thriving community with a lot of happy developers. I think the same could happen with Solid, but we haven’t got our Taylor Otwell yet (the creator of Laravel).

Personally, I have been working in some tools in that direction. But to be honest, I’m not really doing it to contribute to the community; I’m just working in the open and that’s why I open source my code and document my libraries. But it takes me ages to finish anything because I’m just working in sideprojects and it’s not my intention to go full time on this at this point (I’ve been working in my latest Solid app for over a year now xD). In case you’re interested though, here’s the library I’ve been working on: Soukai Solid. And you can check out an app using it here: Media Kraken. Eventually, I’d like to publish a framework as well, allowing for the framework new my-project workflow that you mention. But it’ll probably take months or even years until that happens.

4 Likes

Hi all,
i have read @aschrijver thread on social hub and the most of this one here. And now i wish to seperate the things here in different topics. The title is ‘Is RDF “hard”?’ but the most problems described in this thread is not really a RDF problem, isn’t it?

It’s more about finding existing vocabularies. And thats independent from RDF.
So if you don’t have to worry about interoperability, is RDF still “hard” ?

Did I understand it correctly:
We are talking about RDF not the form of representation like ‘json-ld’, right ?

3 Likes

Java sample to generate rdf:

	void generateSample() {
		
		Model model = new ModelBuilder()
				.subject("http://example.com/myTestObject")
					.add(RDF.TYPE, SCHEMA_ORG.PostalAddress)
					.add(SCHEMA_ORG.name, literal("John's address"))
					.add(SCHEMA_ORG.postalCode, literal("82273"))
					.add(SCHEMA_ORG.addressLocality, literal("Munich"))
					.add(SCHEMA_ORG.addressCountry, literal("Germany"))
				.build();
		
		Rio.write(model, System.out, RDFFormat.TURTLE);
	}

output:

<http://example.com/myTestObject> a <https://schema.org/PostalAddress>;
  <https://schema.org/name> "John's address";
  <https://schema.org/postalCode> "82273";
  <https://schema.org/addressLocality> "Munich";
  <https://schema.org/addressCountry> "Germany" .

and to create a database and save that model i have to add

		// create a database (inMemory)
		SailRepository repository = new SailRepository(new MemoryStore());
		
		// Save the model to the database
		try(RepositoryConnection con = repository.getConnection() ) {
			con.add(model);
		}

So using java and rdf4j is not so bad.

Ok, but I admit that this is a simple example and the way to this example was also hard for me ,-) But it was worth it. a good year ago semantic web was a headline of an article I skipped. And I had no idea what RDF was.

And now, ~ one year later: i love RDF

2 Likes

maybe one day there will be:

Repository repository = new SolidRepostory(“https://john.solidcommunity.net/”);

:heart_eyes:

2 Likes

@ludwigschubi has been working on something on that line, I think. You can check out his shex-codegen tool, and there’s a couple of threads in this forum talking about it:

3 Likes

A great project @jeffz, thanks for posting. To what extent does this depend on Solid-specific stuff vs. usable directly on any RDF compatible (lower-level) apps?

Currently it can take as a ui:dataSource any RDF from any provenance and is specific to Solid only in the sense that if you are logged in, you can access private Solid materials. The RDF may be in the form of a Collection or the library can gather a Collection from a SPARQL query. I am just now finalizing additional specific dataSources for RSS/Atom feeds, Wikidata & Internet Archive searches from which I munge RDF .

1 Like

That’s a fair point - the title of the thread is misnamed in retrospect due to my misunderstanding of the concepts.

To try and condense what is “hard”, at least for me at the start of this thread, and what I’ve learned so far:

Item Number Challenge Remarks Conclusions/Resolutions
1 When modeling entities in a new application, how do you find the vocabularies for it? This is a requirement if and only if you want your RDF to be compatable. Yes, this naturally takes time. Over time though, it may be easier as a function of experience and working with RDF. Toolkits exist and are being made to also help reduce the time to implement.
2 When coming from a traditional SQL relational model, how do you map to RDF triples? RDF triples are a foundational concept; foundational in that you really need to understand them; just as you would take the time to understand how SQL tables, keys, columns and rows work. This only takes time if you are new. The more education and experience that becomes available to developers, the time to implement this can be reduced.

To your point @naturzukunftSolid, RDF is not new and there exists plenty of other frameworks for working with it. I’ve been using dotnetRDF as an example for myself in some of the learning projects I’ve been building.

Going back to kind of what I’d like to do (I’ve been evolving this in my head), given again the previous example:

Given a simple model for handling customers in an online retail website:

class Customer
{
   string FirstName;
   string LastName;
   Address ShippingAddress;
   Address BillingAddress;
}

class Address
{
   string StreetName;
   string StateOrProvince;
   string PostalCode;
   string CountryName;
}

Based off of what I’m reading here, if I could tell my code to infer what vocabularies to use (something similar to leveraging a standard library like in C++; a standard or “common” mapping to various vocabularies), and build the corresponding mapping at compile time (in C#, maybe using reflection or source generators), that would be helpful.

So, for example, just as the Solid website points out that there are well known vocabularies for common things; it would be nice if people could create various “bundles” of vocabularies that might fit a data model (online retail, etc.)

So what I’d like to do, code wise, is something like:

string retailerVocabularies = "Bundle of vocabularies to infer from goes here as a link, or something, that might commonly be used in online retail";

var rdfModeler = new RDFModeler().UseVocabularyBundle(retailerVocabularies);
rdfModeler.RegisterType<Customer>();
rdfModeler.RegisterType<Address>();

Essentially, I’m trying to reduce the implementation time of Item #1 in the table of challenges I mentioned above.

The modeler would use the bundle of vocabularies (maybe it defaults to trying to bind objects that seem to fit with common vocabularies like from Schema.org, w3.org, etc.) to build code at compile time that does the mapping for me - essentially producing the code that you just wrote in Java:

void generateSample() {
        Model model = new ModelBuilder()
                .subject("http://example.com/myTestObject")
                    .add(RDF.TYPE, SCHEMA_ORG.PostalAddress)
                    .add(SCHEMA_ORG.name, literal("John's address"))
                    .add(SCHEMA_ORG.postalCode, literal("82273"))
                    .add(SCHEMA_ORG.addressLocality, literal("Munich"))
                    .add(SCHEMA_ORG.addressCountry, literal("Germany"))
                .build();

        Rio.write(model, System.out, RDFFormat.TURTLE);
    }

I’d like to keep my objects as-is, so then if I had a repository object, I could just leverage it to use my RDF modeler. As an example (this code is obviously made up, but trying to build upon your code example):

// configure to save in-memory
// and pass my rdfModeler object from before
// to help it understand how map things

var retailerRepository = new Repository(new MemoryStore()).ConfigureWith(rdfModeler);

And so I could continue to keep working with my objects as-is:

// skip for example actually init the Customer, just trying to show that the customer 
// object works "as-is" in code

retailerRepository.SaveCustomer(new Customer());

And the repository would leverage the rdfModeler to write out the model to the storage location, in this case, in-memory as an RDF document.

There’s probably some things I’m overlooking, but that’s kind of where I’ve mentally landed on what would be nice. I’m sure that there’s likely a chance that this theoretical RDFModeler could get a bunch of things wrong, so ideally there’s also be an option to inspect the generated source code or to decorate your objects to override what you think the libraries should be, if needed.

I haven’t investigated all the links here, but it sounds like maybe others have tried to solve this as well?

I also want to thank everyone for their input and enthusiasm. It keeps me going, and I appreciate the insight into all of this.

2 Likes