Linked data in html files

I just saw that with the new ess updates, when i request my webid in the new form https://id.inrupt.com/ludwig i get redirected and an html file is returned that embeds the data in my profile in some html.

I get that when one is using a browser and navigating to a webId, it is nice to get a visual representation of the data / some ui. However, shouldn’t it, when requesting the webId from code/an application, default to returning the raw data? As this is a piece of data that is probably requested the most often and probably first, shouldn’t we make sure that there are no unneeded bytes sent in that response?

Maybe I don’t fully understand the background behind the decision to return html by default. Does somebody know what considerations went into this decision?

3 Likes

@ludwigschubi good question, I think this is one that @acoburn can answer best.

1 Like

As another data point, this exact change broke Penny’s compatibility with Pod Spaces. It’s now somewhat fixed if you’re logged in, but you’ll still get HTML if you’re trying to use Penny to view other people’s WebIDs.

I’m expecting (/somewhat hoping :slight_smile: ) that this was just a matter of not knowing in what use cases people would request data without an Accept header. In my ideal world, it’d return HTML if the Accept header says text/html (like browsers do), but default to text/turtle otherwise.

Meanwhile, if you’re sure that you’re fetching a WebID, adding an Accept header set to text/turtle to your request will work. Unfortunately, you can’t always be sure.

1 Like

@Vincent I wonder if you can send an Accept header of

Accept: text/turtle;q=1.0, */*;q=0.5

That is, accept text/turtle with the highest q-factor weighting, then accept everything else at lower weighting?

It looks like that should work:

I could, but what I’m trying to avoid is that Penny is determining the “preferred” serialisation. In other words, if the user has actually uploaded an HTML document to their Pod, I don’t want to get a Turtle description of that document; I’d want them to be able to actually see (and potentially modify) that HTML file. But in the case of a WebID, I’d think the canonical form would be to get the actual data, with an HTML serialisation only being a convenience for those requesting it (e.g. by directly visiting the WebID URL in a web browser).

Is there any server that’d do that? My understanding is that non-turtle/json-ld documents aren’t to be rendered as descriptions of those documents; i.e., something is either a dataset or a non-dataset

1 Like

I seem to remember ESS doing that, but it doesn’t seem to do it now. In that case, I’ll give that a shot and see what happens, thanks!

Edit: one edge case is RDFa, i.e. if there’s RDF embedded in HTML, a server might choose to serve that RDF as Turtle with that Accept header. But I can live with that :slight_smile:

1 Like

Cool, that seems to work, thanks. One thing that’s interesting, though, is that I get 401 Unauthorized when trying to view someone else’s WebID while being logged in with an NSS WebID. When not logged in, or when logged in with another ESS account, it works fine. Feels like that might be an ESS bug?

One thing that’s interesting, though, is that I get 401 Unauthorized when trying to view someone else’s WebID while being logged in with an NSS WebID . When not logged in, or when logged in with another ESS account, it works fine. Feels like that might be an ESS bug?

I think this’ll be due to requesting the WebID with authentication — WebIDs are public documents (per spec) so shouldn’t be requested with authentication. The issue here is that you’re requesting with authentication from NSS against a public document, and the server is saying “I don’t know what authentication that is”

This is where getProfileAll from @inrupt/solid-client SDK comes in, where it explicitly requests the WebID using getSolidDataset(webid) not passing an authenticated fetch via options.

Ah of course; similarly, other public resources will start giving CORS errors when requested with Authorization headers. So I guess the user will have to know in advance whether a given resource (e.g. a WebID) should be viewed authenticated or unauthenticated, and to be able to tell Penny that. Or alternatively, I can try automatically re-fetching it with/without authentication headers if it fails. Thanks!

This seems strange and non spec compliant :

  • per spec you are allowed to look at public and non-public things with authenticated fetch.

  • and non authenticated fetch can only access public things.

1 Like

In HTTP there are two approaches to authentication: pre-emptive authentication and reactive authentication. Pre-emtpive authentication works well when there is a pre-existing (out-of-band) relationship between a client and a server. This is a valid method for most traditional web applications.

For Solid, however, we have a highly distributed ecosystem of apps, servers and identity systems. Having a client know the appropriate authentication mechanism out-of-band is not scalable, but this is what you are seeing in this case: a client assumes that a given resource will accept a particular type of authentication token before even probing what is supported.

A reactive approach to authentication is what will ultimately allow Solid to scale, and for this to work, clients need to start by sending requests with no authorization header. If the resource is public, there is nothing else that needs to be done. And in this particular case, you’re done.

If the resource is protected, then the client will receive a 401 response with a WWW-Authenticate header. That header will inform the client how to proceed: whether to use DPoP, or Bearer or UMA or GNAP or whatever. Then the client uses the appropriate mechanism. A keen reader of the Solid specifications will notice that a particular authentication scheme is not mandated. Solid-OIDC only defines how to retrieve an ID token, not that resource servers must support Bearer token-based authentication.

For some more background on this particular case, the preemptive auth that is being performed with an NSS bearer token results in a 401 because the WebID resource does not accept the provided access token. There are important security reasons that a WebID profile is stored outside of a Pod and there are constraints on which authentication mechanism is supported – if any app could write to a WebID profile, that would be a problem. In this case, the NSS access token will never be accepted by the ESS WebID profile resource; it will always return a 401.

The better approach here (and generally with resources in the Solid ecosystem) is to first attempt to fetch the resource with no authentication headers. Generally, a client will either receive a 200 (in which case nothing more need be done) or a 401. In the case of a 401, the client should look at the WWW-Authenticate header, locating a scheme that it supports and proceed from there.

Some references:

1 Like

This does not seem to respond to the question :

  • I am authenticated and wants to read a public thing,
  • I don’t know it is public
  • do I have to log out or for each thing do a twice fetch ?
1 Like

Having the profile outside the storage makes sense. Serving the WebID Profile Document as HTML and making clients do something special to read it do not.

1 Like
  • I am authenticated and wants to read a public thing,

As described above, use reactive authentication. This involves sending an HTTP request without an authorization header.

  • I don’t know it is public

You will if you use reactive authentication. This involves first sending an HTTP request without an authorization header.

  • do I have to log out or for each thing do a twice fetch ?

This has nothing to do with logging out. This is about a client application not assuming preemptive authorization. If you are concerned about performing a fetch twice, cache the response (which an app should be doing anyway)

This is not special. This is standard HTTP.

a) I have to send an Accept text-turtle if I want to retrieve the profile as RDF, this is unique to ESS. b) I have to not use an authenticated fetch to retrieve the profile, that is unique to ESS.

a) I have to send an Accept text-turtle if I want to retrieve the profile as RDF, this is unique to ESS.

Why is a linked data client not sending an accept header?

b) I have to not use an authenticated fetch to retrieve the profile, that is unique to ESS.

As mentioned above, this is how we can make decentralized authentication work on the Web. It uses standard HTTP mechanisms. The fact that ESS supports this before other Solid servers does not mean that it is incorrect.

Well, except if you also want to figure out what access the current user has, i.e. an unauthenticated fetch might result in read but not write permissions, while the authenticated fetch might have write permissions. So you’ll always have to do two fetches. And if the contents might differ as well depending on whether the user is authenticated, caching the response isn’t possible either.