Solid-recursive-copy

jeffz · January 19, 2019, 5:25pm

I’ve made some changes to @timbl’s solid-recursive-copy. It supports copying an entire directory tree from one Pod location to another. I’d much appreciate user testing before I submit the patches. Please clone or download the PR and run the browserless and/or browser tests on varying data.

When it’s ready for prime time, I’ll include it in solid-file-client and solid-shell, etc.

bourgeoa · January 19, 2019, 10:29pm

Nice job.

Tests
I downloaded and made some tests in chrome browser :

using browser.html
and node-solid-server on 2 locations v4.3 or 4.4 don’t remember

case A : bourgeoa.solid.community (read/write for all) to anna.bourgeoa.ga (read only for all)
1. copy tree on no existing folder : Web error: 403 on PUT
2. copy tree on existing void folder : ok
  2.1 redo (copy tree on existing folder full with same files) : Web error: 403 on PUT
3. copy one file …/test to existing folder : load error …/test/ not found --> ok
case B : reverse
4. reverse from 1. on existing folder full of same existing files : ok
4.1 repeat : ok

Conclusion

it’s working
I am surprises that case 2. did work
I was never asked to connect (cleared local storage)

bourgeoa · January 19, 2019, 10:33pm

Issues are not available on solid-recursive-copy

jeffz · January 19, 2019, 10:44pm

Thanks for the testing. I have already begun work on recursive delete, a bit trickier but should be manageable. So in case 2 - is the void folder publicly writeable? Seems like deepCopy should refuse to copy files into a folder that doesn’t allow writing for the given user.

jeffz · January 19, 2019, 11:00pm

When you say “never asked to connect”, I assume you mean in the browser where that is the expected behavior if you are logged in to an active session… The browserless version should always prompt for credentials unless you’ve stored them in a config file as per the README.

bourgeoa · January 19, 2019, 11:09pm

It is the original /public folder
Owner has all rights and everybody read only
Folder has never been used before.

The tree origin is : /public/tiddlers
containing 1 folder /main and no files, with /main containing 6 files

I checked permissions and confirm everybody read only. May be a bug on v4.4

A_A · May 4, 2019, 7:49am

Regarding the recursiveCopy method in your rdflib.js fork:

https://github.com/jeff-zucker/rdflib.js/blob/766faddad900591f47ef821b90c6d26335d57ccc/src/fetcher.js#L1156-L1158

I don’t think it is safe to expect the acl link to be relative. Judging from this document it could be both, relative to the base url and absolute. (e.g. <https://example.org/.acl>; rel="acl")

Each link-value conveys one target IRI as a URI-Reference (after conversion to one, if necessary; see [RFC3987], Section 3.1) inside angle brackets (“<>”). If the URI-Reference is relative, parsers MUST resolve it as per [RFC3986], Section 5.

Therefore I’d suggest to use url.resolve(baseUrl, aclUrl) from the url package or something similar (rdflib.js already has a join function here, maybe thtat could be used too).

And something I’m curious… Do you know if the * in the specification (section 5.0) means one single white space? I’ve always seen it used as a “repeat zero or more times”, but here it seems to indicate space…?

Link = “Link” “:” #link-value
link-value = “<” URI-Reference “>” *( “;” link-param )

jeffz · May 4, 2019, 3:59pm

Excellent points, thanks, I’ll look into your suggestions. Could I ask you a big favor? Would you mind looking at my attempt at recursiveDelete in the the same repo? It does not work as I’d like it to and I could use your sharp eye on it.

A_A · May 4, 2019, 5:04pm

Yes, I will take a look at it later today or tomorrow morning. Can you explain a bit more in detail what exactly doesn’t work? Is it all together, or just one aspect or one edge case?

jeffz · May 4, 2019, 5:08pm

It has problems deleting containers before they are empty so it will, for example, delete everything except the top level container. Rdflib doesn’t currently support async/await and I’d like to try to do it without using that, but currently it tries to delete the container before the promises emptying it are done.

A_A · May 5, 2019, 7:32am

I think the problem is, that Promise.all doesn’t call the promises in the order they are passed, but (more or less I guess) all at the same time. So the order of the promises passed as parameter is only relevant for the order of the resolved values, but doesn’t imply the order in which the promises are resolved.
Here a short example if this isn’t yet familiar to you:

function sleep(ms) { return new Promise((resolve) => setTimeout(resolve, ms)); }
Promise.all([
  sleep(2000).then(() => console.log('first')),
  sleep(1000).then(() => console.log('second'))
])
// Will log "second" after one second and "first" after another one. The resolved values will be in order [value of first, value of second].

So in promise-pseudo code I’d suggest a structure like this:

function deleteFolderRecursive(folderUrl) {
  fetchFolder(...)
    .then(folder => {
       const contents = getContents(folder);
       const deleteContentPromises = contents.map(item => isFile(item) ? deleteFileAndAcl(item) : deleteFolderRecursive(item)); // Could be moved to a helper function for easier to read code

       return Promise.all(deleteContentPromises)
         .then(() => deleteFolderAndAcl(folderUrl)); // Now all contents have been deleted so it is safe to delete the folder now
         .catch(...)
       // Note: Something that's worth considering is, that the delete functions firstly delete the file/folder and 
       // .then the acl file. If it's not done that way, it could happen that access is unintentionally given for a 
       // short amount of time (or when deleting the file/folder fails, but deleting the acl succeeds even permanently)
    }).catch(...)
}

EDIT: Or probably easier to read with chaining promises:

function deleteFolderRecursive(folderUrl) {
  fetchFolder(...)
    .then(deleteFolderContents)
    .then(deleteFolder);
}
function deleteFolderContents(folder /* The results from fetchFolder */) {
  const folderContents = getContents(folder);
  const promises = contents.map(item => isFile(item) ? deleteFileAndAcl(item) : deleteFolderRecursive(item));
  return Promise.all(promises)
    .then(folder) ; // Resolve with arguments so the next function in the promise chain has the same
}
function deleteFolder(folder) {
  return apiDeleteFolderAndAcl(folder)
    .then(folder);
}

In particular, it seems that the error lies in following lines:

github.com

jeff-zucker/rdflib.js/blob/master/src/fetcher.js#L1217-L1221


  promises.push(delContainer(target));
},e=>{})
} catch(e){ }
promises.push(delContainer(repeat));
Promise.all(promises).then(res => {

promises.push(delContainer(target)) will lead to deleting the container at the same time as the contents. promises.push(delContainer(repeat)); also, but i didn’t really get what it means.

If you want help implementing it, just ask. But I thought I’d try to explain the problem a bit more in detail first

A_A · May 5, 2019, 7:43am

And two gotchas I’ve found in recursiveCopy:

github.com

jeff-zucker/rdflib.js/blob/766faddad900591f47ef821b90c6d26335d57ccc/src/fetcher.js#L1110


dest.uri = (dest.uri.match(/\/$/)) ? dest.uri : dest.uri + "/";
options = options || {}
let promises = []
return new Promise(function(resolve, reject){
  ft.load(src).then(function(response) {
    if (!response.ok) throw new Error(
      'Error reading container ' + src + ' : ' + response.status
    )
    let contents = st.each(src, ns.ldp('contains'))
    promises = []
    copyContainer(src,dest,options) //   HANDLE THE TOP FOLDER 
    for (let i=0; i < contents.length; i++){
      let here = contents[i]
      let there = mapURI(src, dest, here)
      // SEND SUB-FOLDER BACK FOR PROCESSING 
      if (st.holds(here, ns.rdf('type'), ns.ldp('Container'))){
        promises.push(ft.recursiveCopy(here, there, options))
      }
      else { // COPY THE FILE
        if(options.copyACL)fetchAclDoc(here.uri,there.uri).then()
        console.log('Copying file ' + here+"\n  to "+ there+"\n")

In this line it sends a request to create the container but doesn’t wait for the result before proceeding. So it could happen that it sends both requests and the second one gets handled first (I don’t think there’s some specification preventing this) which means that it tries to create the content of the folder which hasn’t yet been created. (Actually this should work with PUT requests regarding to the solid spec, but if you do it that way I would add a comment to clarify it).

github.com

jeff-zucker/rdflib.js/blob/766faddad900591f47ef821b90c6d26335d57ccc/src/fetcher.js#L1124


      // SEND SUB-FOLDER BACK FOR PROCESSING 
      if (st.holds(here, ns.rdf('type'), ns.ldp('Container'))){
        promises.push(ft.recursiveCopy(here, there, options))
      }
      else { // COPY THE FILE
        if(options.copyACL)fetchAclDoc(here.uri,there.uri).then()
        console.log('Copying file ' + here+"\n  to "+ there+"\n")
        promises.push(ft.webCopy(here,there,{contentType:"text/turtle"}))
      }
    }
    Promise.all(promises).then(resolve(""))
    .catch(function (e) {
      console.log("Overall promise rejected: " + e)
      reject(e)
    })
  })
  .catch( function(error) {
    reject('Load error: ' + error)
  })
})
function mapURI(src, dest, x){

Here the parameter passed to .then is not a function, but the evaluated result of resolve(""), which means the overall Promise is resolved as soon as it reaches this line. You likely want Promise.all(promises).then(() => resolve("")) or Promise.all(promises).then(resolve). Latter would call resolve with the resolved values of the promises.

jeffz · May 15, 2019, 5:18pm

[edit : using this package is not needed, see my post on rdflib below for what does work]

Thanks so much for your help @A_A. On investigation, the problems with parsing the link header are even worse than you pointed out - for example there can be multiple links in a given header. On @konobi’s suggestion, I’m using parse-link-header from npm along with the rdflib join method you pointed out.

#!/usr/bin/env node
const Uri = require('./src/uri')
const linkParse = require('parse-link-header')

let uri = "https://example/"
let rel = ".acl"
let abs = "https://example/.acl"

test( `Link: <${rel}>; rel="acl"` )
test( `Link: <${abs}>; rel="acl"` )
test( `Link: <${rel}>; foo="bar"; rel="acl"` )
test( `Link: <${rel}>; rel="acl", <foo>; rel="bar";` )

function test(str){
   let header = linkParse(str);
   let link = header.acl.url.replace(/Link: </,'')
   let acl =  Uri.join( link, uri )
   if( acl === uri+rel ) console.log("ok")
   else console.log( "fail : "+acl )
}

A_A · May 15, 2019, 6:51pm

I’m glad that I was able to help you! And thanks for mentioning that link headers are more complex than expected, I didn’t know that something like <https://example.org>; foo="bar"; rel="acl" is possible yet.

If you want any further help, feel free to ask

jeffz · May 15, 2019, 7:12pm

The spec gives this example :

Link: <mailto:timbl@w3.org>; rev="Made"; title="Tim Berners-Lee"

jeffz · May 15, 2019, 10:15pm

[edit : this is inadequate, see my post on rdflib below for what does work]

For purposes of finding the ACL, this regex I developed seems to work as well as using the parse-link-headers package. Can anyone spot potential problems?

let linkHeader = response.headers.get("Link")
let aclDoc = linkHeader.replace(/>;[^>]*rel="acl".*/,'').replace(/^.*</,'')

A_A · May 16, 2019, 5:31am

Would something like <https://example.org>; type="<tags>"; rel="acl" or <https://example.org?x=\<xyz>; rel="acl" be possible? If such things are allowed by the spec, I don’t think this would be possible to solve with regex only (this seems like it would need recursion).

jeffz · May 16, 2019, 6:24pm

Indeed , those appear to be permissible. But, as it turns out, rdflib solves all of this. Whenever rdflib’s fetcher loads a resource, rdflib automatically uses its own parseLinkHeader method to parse the header and store the header as RDF. So we can use RDF to find the ACL for a file like this:

const uri    = $rdf.sym("https://jeffz.solid.community/public/test.ttl")
const aclRel = $rdf.sym('http://www.iana.org/assignments/link-relations/acl')

fetcher.load(uri).then( res => {
    console.log(  store.any(uri,aclRel).value ) // absolute URI of ACL file
})

Topic		Replies	Views
Announce : much updated Solid-File-Client	40	2587	January 9, 2020
Solid file manager App Testing	52	7556	December 6, 2019
NSS 5.0.0 =>10 to go!	27	3567	March 4, 2019
We are not sure how we could share files between users Build a Solid App	56	3364	March 30, 2020
Is my pod dead?	26	2785	February 22, 2020

Solid-recursive-copy

Related topics