Compression: Possible or preferable to use prefix / aliases with javascript Solid Dataset?

ajp · August 26, 2021, 7:49am

I am saving a large dataset and thought there might be good practice to use prefixes or aliases to shorten the length of the .ttl file. An example file is:

@prefix as: <https://www.w3.org/ns/activitystreams#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ldp: <http://www.w3.org/ns/ldp#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix acl: <http://www.w3.org/ns/auth/acl#> .
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix acp: <http://www.w3.org/ns/solid/acp#> .

<https://pod.inrupt.com/ajp/data_curator_v1/world_components.ttl>
        rdf:type  ldp:RDFSource .

<https://pod.inrupt.com/ajp/data_curator_v1/world_components.ttl#wc593038181053847>
        <http://datacurator.org/schema/v1/json>  "{\"id\":\"wc593038181053847\",\"created_at\":\"2021-08-25T21:41:11.872Z\",\"title\":\"hello world\",\"description\":\"\",\"type\":\"state\",\"label_ids\":[],\"values\":[]}" ;
        <http://datacurator.org/schema/v1/title>  "hello world" .

<https://pod.inrupt.com/ajp/data_curator_v1/world_components.ttl#wc2>
        <http://datacurator.org/schema/v1/json>  "{\"id\":\"wc2\",\"created_at\":\"2021-08-25T21:41:11.872Z\",\"title\":\"hello world2\",\"description\":\"\",\"type\":\"state\",\"label_ids\":[],\"values\":[]}" ;
        <http://datacurator.org/schema/v1/title>  "hello world2" .

# Many more entries....

Firstly when this was being saved on solidcommunity.net the Thing id was just being the id with <#...> e.g. <#wc1>.

Secondly it would seem repeating <http://datacurator.org/schema/v1/title> and 20 other fields (when I eventually destructure <http://datacurator.org/schema/v1/json> into its component parts) would be wasteful of bytes required to send / receive.

Are there any good practices about this? Is it possible to do provide instructions from the client about prefixes/aliases etc?

Currently the code to save is:

const _V1 = "http://datacurator.org/schema/v1/"

export const V1 = {
    title: _V1 + "title",
    json: _V1 + "json",
}


async function save_items <I extends Base & { title: string }> (items_URL: string, items: I[])
{
    let items_dataset = createSolidDataset()
    items.forEach(item =>
    {
        let thing = createThing({ name: item.id })
        thing = addStringNoLocale(thing, V1.title, item.title)
        thing = addStringNoLocale(thing, V1.json, JSON.stringify(item))
        items_dataset = setThing(items_dataset, thing)
    })


    try
    {
        // First delete because pod.inrupt.com is not compliant with documentation
        // 412 is raised if trying to overwrite existing resource
        if (items_URL.includes("pod.inrupt.com/")) await deleteSolidDataset(items_URL, { fetch: solid_fetch })
    }
    catch (err)
    {
        if (!err || (err.statusCode !== 404)) console.error(`Error deleting "${items_URL}"`, err)
    }


    try {
        // console .log("Saving...")
        // Save the SolidDataset
        /* let saved_items_dataset = */ await saveSolidDatasetAt(items_URL, items_dataset, { fetch: solid_fetch })
        //console .log("Saved!")

        return Promise.resolve()
    } catch (err) {
        console.error(`error saving items to "${items_URL}" :`, err)
        const error: SyncError = { type: "general", message: err }
        return Promise.reject(error)
    }
}

Thank you.

Vincent · August 26, 2021, 9:47am

One important thing to realise is that what solid-client calls a SolidDataset isn’t necessarily saved as a file on the server. Solid servers should be able to return that data both as text/turtle as well as application/ld+json. While some servers do represent the data sent as plain files on a physical file systems, others store it in a database and dynamically generated the Turtle or JSON-LD as requested - you can see this on pod.inrupt.com for example.

As a potentially helpful analogy, imagine that you could store .csv files on a server, but request those as both CSVs as well as as JSON with the data in an array. While the server could just save the .csv to disk and convert it to JSON when requested, but it would probably be more efficient for it to use a database, store the rows of the CSV in there, and then serialise that to either CSV or JSON when requested.

ajp · August 26, 2021, 10:00am

Thanks @Vincent so app developers do not take responsibility for optimising file size in how they construct a dataset to send to the server? The client libraries they use should take the responsibility for optimising the dataset for sending to the server and the server should optimise for storage and any subsequent returns to any requesting clients?

Vincent · August 26, 2021, 10:30am

That’s exactly right. Although so far I’m not aware of any client libraries or servers that actually take that into account, and it’s not something I’ve thought about before, but that could be an interesting feature request.

Edit: ah but of course, if the server gzips responses (as they probably do), it’s not much an issue for responses anyway. And client-side writes probably won’t be that big, usually?

jeffz · August 26, 2021, 4:23pm

There is some server-side optimization on NSS. For example if you send

<#A> a <#B>.
<#A> a <#C>.

NSS writes

@prefix : <#>.
:A  a :B, :C.

[Edit : actually, this is, I think, done by the SolidOS databrowser client, rather than by the NSS server. Yes it is, see below]

jeffz · August 26, 2021, 4:25pm

And, I believe, it also uses prefixes as aliases in most or all cases.

jeffz · August 26, 2021, 9:10pm

Okay, I checked and the optimization is done in rdflib, part of the SolidOS stack.

So :

using rdflib or a client that uses it client side optimizes as above.
a direct PUT does not optimize
file modification with the SolidOS databrowser does not optimize
other client libraries : probably same as PUT

Topic		Replies	Views
Solid browser Long Chat questions Become a Pod Provider	2	1314	May 1, 2019
On datashapes as a complement to vocab Solid App Development FAQs	2	988	February 27, 2019
Turtle file prefixes are lost when updating solid community-server	4	589	December 3, 2021
Ls LDP Container using LDFlex Build a Solid App	5	582	December 24, 2019
Curious 400 error when removing using Inrupt Solid App Development FAQs	7	966	April 6, 2021

Compression: Possible or preferable to use prefix / aliases with javascript Solid Dataset?

Related topics