Linked data in html files

anon85132706 · July 25, 2022, 3:41pm

I just saw that with the new ess updates, when i request my webid in the new form https://id.inrupt.com/ludwig i get redirected and an html file is returned that embeds the data in my profile in some html.

I get that when one is using a browser and navigating to a webId, it is nice to get a visual representation of the data / some ui. However, shouldn’t it, when requesting the webId from code/an application, default to returning the raw data? As this is a piece of data that is probably requested the most often and probably first, shouldn’t we make sure that there are no unneeded bytes sent in that response?

Maybe I don’t fully understand the background behind the decision to return html by default. Does somebody know what considerations went into this decision?

ThisIsMissEm · July 25, 2022, 11:56pm

@anon85132706 good question, I think this is one that @acoburn can answer best.

Vincent · July 26, 2022, 2:18pm

As another data point, this exact change broke Penny’s compatibility with Pod Spaces. It’s now somewhat fixed if you’re logged in, but you’ll still get HTML if you’re trying to use Penny to view other people’s WebIDs.

I’m expecting (/somewhat hoping ) that this was just a matter of not knowing in what use cases people would request data without an Accept header. In my ideal world, it’d return HTML if the Accept header says text/html (like browsers do), but default to text/turtle otherwise.

Meanwhile, if you’re sure that you’re fetching a WebID, adding an Accept header set to text/turtle to your request will work. Unfortunately, you can’t always be sure.

ThisIsMissEm · July 26, 2022, 10:45pm

@Vincent I wonder if you can send an Accept header of

Accept: text/turtle;q=1.0, */*;q=0.5

That is, accept text/turtle with the highest q-factor weighting, then accept everything else at lower weighting?

ThisIsMissEm · July 26, 2022, 10:49pm

It looks like that should work:

Vincent · July 27, 2022, 7:42am

I could, but what I’m trying to avoid is that Penny is determining the “preferred” serialisation. In other words, if the user has actually uploaded an HTML document to their Pod, I don’t want to get a Turtle description of that document; I’d want them to be able to actually see (and potentially modify) that HTML file. But in the case of a WebID, I’d think the canonical form would be to get the actual data, with an HTML serialisation only being a convenience for those requesting it (e.g. by directly visiting the WebID URL in a web browser).

ThisIsMissEm · July 27, 2022, 8:07am

Is there any server that’d do that? My understanding is that non-turtle/json-ld documents aren’t to be rendered as descriptions of those documents; i.e., something is either a dataset or a non-dataset

Vincent · July 27, 2022, 8:56am

I seem to remember ESS doing that, but it doesn’t seem to do it now. In that case, I’ll give that a shot and see what happens, thanks!

Edit: one edge case is RDFa, i.e. if there’s RDF embedded in HTML, a server might choose to serve that RDF as Turtle with that Accept header. But I can live with that

Vincent · July 30, 2022, 4:28pm

Cool, that seems to work, thanks. One thing that’s interesting, though, is that I get 401 Unauthorized when trying to view someone else’s WebID while being logged in with an NSS WebID. When not logged in, or when logged in with another ESS account, it works fine. Feels like that might be an ESS bug?

ThisIsMissEm · August 1, 2022, 6:13pm

One thing that’s interesting, though, is that I get 401 Unauthorized when trying to view someone else’s WebID while being logged in with an NSS WebID . When not logged in, or when logged in with another ESS account, it works fine. Feels like that might be an ESS bug?

I think this’ll be due to requesting the WebID with authentication — WebIDs are public documents (per spec) so shouldn’t be requested with authentication. The issue here is that you’re requesting with authentication from NSS against a public document, and the server is saying “I don’t know what authentication that is”

This is where getProfileAll from @inrupt/solid-client SDK comes in, where it explicitly requests the WebID using getSolidDataset(webid) not passing an authenticated fetch via options.

Vincent · August 1, 2022, 6:52pm

Ah of course; similarly, other public resources will start giving CORS errors when requested with Authorization headers. So I guess the user will have to know in advance whether a given resource (e.g. a WebID) should be viewed authenticated or unauthenticated, and to be able to tell Penny that. Or alternatively, I can try automatically re-fetching it with/without authentication headers if it fails. Thanks!

bourgeoa · August 2, 2022, 10:26am

This seems strange and non spec compliant :

per spec you are allowed to look at public and non-public things with authenticated fetch.
and non authenticated fetch can only access public things.

acoburn · August 2, 2022, 1:45pm

In HTTP there are two approaches to authentication: pre-emptive authentication and reactive authentication. Pre-emtpive authentication works well when there is a pre-existing (out-of-band) relationship between a client and a server. This is a valid method for most traditional web applications.

For Solid, however, we have a highly distributed ecosystem of apps, servers and identity systems. Having a client know the appropriate authentication mechanism out-of-band is not scalable, but this is what you are seeing in this case: a client assumes that a given resource will accept a particular type of authentication token before even probing what is supported.

A reactive approach to authentication is what will ultimately allow Solid to scale, and for this to work, clients need to start by sending requests with no authorization header. If the resource is public, there is nothing else that needs to be done. And in this particular case, you’re done.

If the resource is protected, then the client will receive a 401 response with a WWW-Authenticate header. That header will inform the client how to proceed: whether to use DPoP, or Bearer or UMA or GNAP or whatever. Then the client uses the appropriate mechanism. A keen reader of the Solid specifications will notice that a particular authentication scheme is not mandated. Solid-OIDC only defines how to retrieve an ID token, not that resource servers must support Bearer token-based authentication.

For some more background on this particular case, the preemptive auth that is being performed with an NSS bearer token results in a 401 because the WebID resource does not accept the provided access token. There are important security reasons that a WebID profile is stored outside of a Pod and there are constraints on which authentication mechanism is supported – if any app could write to a WebID profile, that would be a problem. In this case, the NSS access token will never be accepted by the ESS WebID profile resource; it will always return a 401.

The better approach here (and generally with resources in the Solid ecosystem) is to first attempt to fetch the resource with no authentication headers. Generally, a client will either receive a 200 (in which case nothing more need be done) or a 401. In the case of a 401, the client should look at the WWW-Authenticate header, locating a scheme that it supports and proceed from there.

Some references:

bourgeoa · August 2, 2022, 3:44pm

This does not seem to respond to the question :

I am authenticated and wants to read a public thing,
I don’t know it is public
do I have to log out or for each thing do a twice fetch ?

jeffz · August 2, 2022, 4:08pm

Having the profile outside the storage makes sense. Serving the WebID Profile Document as HTML and making clients do something special to read it do not.

acoburn · August 2, 2022, 4:08pm

I am authenticated and wants to read a public thing,

As described above, use reactive authentication. This involves sending an HTTP request without an authorization header.

I don’t know it is public

You will if you use reactive authentication. This involves first sending an HTTP request without an authorization header.

do I have to log out or for each thing do a twice fetch ?

This has nothing to do with logging out. This is about a client application not assuming preemptive authorization. If you are concerned about performing a fetch twice, cache the response (which an app should be doing anyway)

acoburn · August 2, 2022, 4:10pm

This is not special. This is standard HTTP.

jeffz · August 2, 2022, 4:12pm

a) I have to send an Accept text-turtle if I want to retrieve the profile as RDF, this is unique to ESS. b) I have to not use an authenticated fetch to retrieve the profile, that is unique to ESS.

acoburn · August 2, 2022, 4:16pm

a) I have to send an Accept text-turtle if I want to retrieve the profile as RDF, this is unique to ESS.

Why is a linked data client not sending an accept header?

b) I have to not use an authenticated fetch to retrieve the profile, that is unique to ESS.

As mentioned above, this is how we can make decentralized authentication work on the Web. It uses standard HTTP mechanisms. The fact that ESS supports this before other Solid servers does not mean that it is incorrect.

Vincent · August 2, 2022, 4:16pm

Well, except if you also want to figure out what access the current user has, i.e. an unauthenticated fetch might result in read but not write permissions, while the authenticated fetch might have write permissions. So you’ll always have to do two fetches. And if the contents might differ as well depending on whether the user is authenticated, caching the response isn’t possible either.

Topic		Replies	Views
New to Solid - Turtle	3	1023	April 2, 2019
clientId in ttl for authOptions? Solid App Development FAQs	2	311	December 23, 2023
What's in a link?	4	575	November 19, 2018
Using Solid to power a "community hub" website - does this fit the Solid use case? Solid App Development FAQs	3	656	July 21, 2020
Accessing pods linked to my webID Build a Solid App	3	247	April 4, 2024

Linked data in html files

Related topics