Solid-recursive-copy

I’ve made some changes to @timbl’s solid-recursive-copy. It supports copying an entire directory tree from one Pod location to another. I’d much appreciate user testing before I submit the patches. Please clone or download the PR and run the browserless and/or browser tests on varying data.

When it’s ready for prime time, I’ll include it in solid-file-client and solid-shell, etc.

2 Likes

Nice job.

Tests
I downloaded and made some tests in chrome browser :

  • using browser.html
  • and node-solid-server on 2 locations v4.3 or 4.4 don’t remember
  • case A : bourgeoa.solid.community (read/write for all) to anna.bourgeoa.ga (read only for all)

    1. copy tree on no existing folder : Web error: 403 on PUT

    2. copy tree on existing void folder : ok
      2.1 redo (copy tree on existing folder full with same files) : Web error: 403 on PUT

    3. copy one file …/test to existing folder : load error …/test/ not found --> ok

  • case B : reverse
    4. reverse from 1. on existing folder full of same existing files : ok
    4.1 repeat : ok

Conclusion

  • it’s working
  • I am surprises that case 2. did work
  • I was never asked to connect (cleared local storage)

Issues are not available on solid-recursive-copy

Thanks for the testing. I have already begun work on recursive delete, a bit trickier but should be manageable. So in case 2 - is the void folder publicly writeable? Seems like deepCopy should refuse to copy files into a folder that doesn’t allow writing for the given user.

When you say “never asked to connect”, I assume you mean in the browser where that is the expected behavior if you are logged in to an active session… The browserless version should always prompt for credentials unless you’ve stored them in a config file as per the README.

It is the original /public folder
Owner has all rights and everybody read only
Folder has never been used before.

The tree origin is : /public/tiddlers
containing 1 folder /main and no files, with /main containing 6 files

I checked permissions and confirm everybody read only. May be a bug on v4.4

1 Like

Regarding the recursiveCopy method in your rdflib.js fork:

https://github.com/jeff-zucker/rdflib.js/blob/766faddad900591f47ef821b90c6d26335d57ccc/src/fetcher.js#L1156-L1158

I don’t think it is safe to expect the acl link to be relative. Judging from this document it could be both, relative to the base url and absolute. (e.g. <https://example.org/.acl>; rel="acl")

Each link-value conveys one target IRI as a URI-Reference (after conversion to one, if necessary; see [RFC3987], Section 3.1) inside angle brackets (“<>”). If the URI-Reference is relative, parsers MUST resolve it as per [RFC3986], Section 5.

Therefore I’d suggest to use url.resolve(baseUrl, aclUrl) from the url package or something similar (rdflib.js already has a join function here, maybe thtat could be used too).

And something I’m curious… Do you know if the * in the specification (section 5.0) means one single white space? I’ve always seen it used as a “repeat zero or more times”, but here it seems to indicate space…?

Link = “Link” “:” #link-value
link-value = “<” URI-Reference “>” *( “;” link-param )

1 Like

Excellent points, thanks, I’ll look into your suggestions. Could I ask you a big favor? Would you mind looking at my attempt at recursiveDelete in the the same repo? It does not work as I’d like it to and I could use your sharp eye on it.

Yes, I will take a look at it later today or tomorrow morning. Can you explain a bit more in detail what exactly doesn’t work? Is it all together, or just one aspect or one edge case?

1 Like

It has problems deleting containers before they are empty so it will, for example, delete everything except the top level container. Rdflib doesn’t currently support async/await and I’d like to try to do it without using that, but currently it tries to delete the container before the promises emptying it are done.

I think the problem is, that Promise.all doesn’t call the promises in the order they are passed, but (more or less I guess) all at the same time. So the order of the promises passed as parameter is only relevant for the order of the resolved values, but doesn’t imply the order in which the promises are resolved.
Here a short example if this isn’t yet familiar to you:

function sleep(ms) { return new Promise((resolve) => setTimeout(resolve, ms)); }
Promise.all([
  sleep(2000).then(() => console.log('first')),
  sleep(1000).then(() => console.log('second'))
])
// Will log "second" after one second and "first" after another one. The resolved values will be in order [value of first, value of second].

So in promise-pseudo code I’d suggest a structure like this:

function deleteFolderRecursive(folderUrl) {
  fetchFolder(...)
    .then(folder => {
       const contents = getContents(folder);
       const deleteContentPromises = contents.map(item => isFile(item) ? deleteFileAndAcl(item) : deleteFolderRecursive(item)); // Could be moved to a helper function for easier to read code

       return Promise.all(deleteContentPromises)
         .then(() => deleteFolderAndAcl(folderUrl)); // Now all contents have been deleted so it is safe to delete the folder now
         .catch(...)
       // Note: Something that's worth considering is, that the delete functions firstly delete the file/folder and 
       // .then the acl file. If it's not done that way, it could happen that access is unintentionally given for a 
       // short amount of time (or when deleting the file/folder fails, but deleting the acl succeeds even permanently)
    }).catch(...)
}

EDIT: Or probably easier to read with chaining promises:

function deleteFolderRecursive(folderUrl) {
  fetchFolder(...)
    .then(deleteFolderContents)
    .then(deleteFolder);
}
function deleteFolderContents(folder /* The results from fetchFolder */) {
  const folderContents = getContents(folder);
  const promises = contents.map(item => isFile(item) ? deleteFileAndAcl(item) : deleteFolderRecursive(item));
  return Promise.all(promises)
    .then(folder) ; // Resolve with arguments so the next function in the promise chain has the same
}
function deleteFolder(folder) {
  return apiDeleteFolderAndAcl(folder)
    .then(folder);
}

In particular, it seems that the error lies in following lines:


promises.push(delContainer(target)) will lead to deleting the container at the same time as the contents. promises.push(delContainer(repeat)); also, but i didn’t really get what it means.

If you want help implementing it, just ask. But I thought I’d try to explain the problem a bit more in detail first :slight_smile:

1 Like

And two gotchas I’ve found in recursiveCopy:


In this line it sends a request to create the container but doesn’t wait for the result before proceeding. So it could happen that it sends both requests and the second one gets handled first (I don’t think there’s some specification preventing this) which means that it tries to create the content of the folder which hasn’t yet been created. (Actually this should work with PUT requests regarding to the solid spec, but if you do it that way I would add a comment to clarify it).


Here the parameter passed to .then is not a function, but the evaluated result of resolve(""), which means the overall Promise is resolved as soon as it reaches this line. You likely want Promise.all(promises).then(() => resolve("")) or Promise.all(promises).then(resolve). Latter would call resolve with the resolved values of the promises.

[edit : using this package is not needed, see my post on rdflib below for what does work]

Thanks so much for your help @A_A. On investigation, the problems with parsing the link header are even worse than you pointed out - for example there can be multiple links in a given header. On @konobi’s suggestion, I’m using parse-link-header from npm along with the rdflib join method you pointed out.

#!/usr/bin/env node
const Uri = require('./src/uri')
const linkParse = require('parse-link-header')

let uri = "https://example/"
let rel = ".acl"
let abs = "https://example/.acl"

test( `Link: <${rel}>; rel="acl"` )
test( `Link: <${abs}>; rel="acl"` )
test( `Link: <${rel}>; foo="bar"; rel="acl"` )
test( `Link: <${rel}>; rel="acl", <foo>; rel="bar";` )

function test(str){
   let header = linkParse(str);
   let link = header.acl.url.replace(/Link: </,'')
   let acl =  Uri.join( link, uri )
   if( acl === uri+rel ) console.log("ok")
   else console.log( "fail : "+acl )
}
2 Likes

I’m glad that I was able to help you! And thanks for mentioning that link headers are more complex than expected, I didn’t know that something like <https://example.org>; foo="bar"; rel="acl" is possible yet.

If you want any further help, feel free to ask

1 Like

The spec gives this example :

Link: <mailto:timbl@w3.org>; rev="Made"; title="Tim Berners-Lee"
1 Like

[edit : this is inadequate, see my post on rdflib below for what does work]

For purposes of finding the ACL, this regex I developed seems to work as well as using the parse-link-headers package. Can anyone spot potential problems?

let linkHeader = response.headers.get("Link")
let aclDoc = linkHeader.replace(/>;[^>]*rel="acl".*/,'').replace(/^.*</,'')

Would something like <https://example.org>; type="<tags>"; rel="acl" or <https://example.org?x=\<xyz>; rel="acl" be possible? If such things are allowed by the spec, I don’t think this would be possible to solve with regex only (this seems like it would need recursion).

Indeed , those appear to be permissible. But, as it turns out, rdflib solves all of this. Whenever rdflib’s fetcher loads a resource, rdflib automatically uses its own parseLinkHeader method to parse the header and store the header as RDF. So we can use RDF to find the ACL for a file like this:

const uri    = $rdf.sym("https://jeffz.solid.community/public/test.ttl")
const aclRel = $rdf.sym('http://www.iana.org/assignments/link-relations/acl')

fetcher.load(uri).then( res => {
    console.log(  store.any(uri,aclRel).value ) // absolute URI of ACL file
})
2 Likes