Compression: Possible or preferable to use prefix / aliases with javascript Solid Dataset?

I am saving a large dataset and thought there might be good practice to use prefixes or aliases to shorten the length of the .ttl file. An example file is:

@prefix as: <https://www.w3.org/ns/activitystreams#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix ldp: <http://www.w3.org/ns/ldp#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix acl: <http://www.w3.org/ns/auth/acl#> .
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix acp: <http://www.w3.org/ns/solid/acp#> .

<https://pod.inrupt.com/ajp/data_curator_v1/world_components.ttl>
        rdf:type  ldp:RDFSource .

<https://pod.inrupt.com/ajp/data_curator_v1/world_components.ttl#wc593038181053847>
        <http://datacurator.org/schema/v1/json>  "{\"id\":\"wc593038181053847\",\"created_at\":\"2021-08-25T21:41:11.872Z\",\"title\":\"hello world\",\"description\":\"\",\"type\":\"state\",\"label_ids\":[],\"values\":[]}" ;
        <http://datacurator.org/schema/v1/title>  "hello world" .

<https://pod.inrupt.com/ajp/data_curator_v1/world_components.ttl#wc2>
        <http://datacurator.org/schema/v1/json>  "{\"id\":\"wc2\",\"created_at\":\"2021-08-25T21:41:11.872Z\",\"title\":\"hello world2\",\"description\":\"\",\"type\":\"state\",\"label_ids\":[],\"values\":[]}" ;
        <http://datacurator.org/schema/v1/title>  "hello world2" .

# Many more entries....

Firstly when this was being saved on solidcommunity.net the Thing id was just being the id with <#...> e.g. <#wc1>.

Secondly it would seem repeating <http://datacurator.org/schema/v1/title> and 20 other fields (when I eventually destructure <http://datacurator.org/schema/v1/json> into its component parts) would be wasteful of bytes required to send / receive.

Are there any good practices about this? Is it possible to do provide instructions from the client about prefixes/aliases etc?

Currently the code to save is:

const _V1 = "http://datacurator.org/schema/v1/"

export const V1 = {
    title: _V1 + "title",
    json: _V1 + "json",
}


async function save_items <I extends Base & { title: string }> (items_URL: string, items: I[])
{
    let items_dataset = createSolidDataset()
    items.forEach(item =>
    {
        let thing = createThing({ name: item.id })
        thing = addStringNoLocale(thing, V1.title, item.title)
        thing = addStringNoLocale(thing, V1.json, JSON.stringify(item))
        items_dataset = setThing(items_dataset, thing)
    })


    try
    {
        // First delete because pod.inrupt.com is not compliant with documentation
        // 412 is raised if trying to overwrite existing resource
        if (items_URL.includes("pod.inrupt.com/")) await deleteSolidDataset(items_URL, { fetch: solid_fetch })
    }
    catch (err)
    {
        if (!err || (err.statusCode !== 404)) console.error(`Error deleting "${items_URL}"`, err)
    }


    try {
        // console .log("Saving...")
        // Save the SolidDataset
        /* let saved_items_dataset = */ await saveSolidDatasetAt(items_URL, items_dataset, { fetch: solid_fetch })
        //console .log("Saved!")

        return Promise.resolve()
    } catch (err) {
        console.error(`error saving items to "${items_URL}" :`, err)
        const error: SyncError = { type: "general", message: err }
        return Promise.reject(error)
    }
}

Thank you.

One important thing to realise is that what solid-client calls a SolidDataset isn’t necessarily saved as a file on the server. Solid servers should be able to return that data both as text/turtle as well as application/ld+json. While some servers do represent the data sent as plain files on a physical file systems, others store it in a database and dynamically generated the Turtle or JSON-LD as requested - you can see this on pod.inrupt.com for example.

As a potentially helpful analogy, imagine that you could store .csv files on a server, but request those as both CSVs as well as as JSON with the data in an array. While the server could just save the .csv to disk and convert it to JSON when requested, but it would probably be more efficient for it to use a database, store the rows of the CSV in there, and then serialise that to either CSV or JSON when requested.

1 Like

Thanks @Vincent so app developers do not take responsibility for optimising file size in how they construct a dataset to send to the server? The client libraries they use should take the responsibility for optimising the dataset for sending to the server and the server should optimise for storage and any subsequent returns to any requesting clients?

That’s exactly right. Although so far I’m not aware of any client libraries or servers that actually take that into account, and it’s not something I’ve thought about before, but that could be an interesting feature request.

Edit: ah but of course, if the server gzips responses (as they probably do), it’s not much an issue for responses anyway. And client-side writes probably won’t be that big, usually?

1 Like

There is some server-side optimization on NSS. For example if you send

<#A> a <#B>.
<#A> a <#C>.

NSS writes

@prefix : <#>.
:A  a :B, :C.

[Edit : actually, this is, I think, done by the SolidOS databrowser client, rather than by the NSS server. Yes it is, see below]

1 Like

And, I believe, it also uses prefixes as aliases in most or all cases.

1 Like

Okay, I checked and the optimization is done in rdflib, part of the SolidOS stack.

So :

  • using rdflib or a client that uses it client side optimizes as above.
  • a direct PUT does not optimize
  • file modification with the SolidOS databrowser does not optimize
  • other client libraries : probably same as PUT