@kjetilk: "There have been a number of issues reported (linked in #882) where it seems that user’s data files have been corrupted. Some of these things should be fixed on the frontend, so that it makes sure invalid data is not sent. However, the backend should also check, e.g. by doing a RDF validation as suggested in #882, but we could also imagine SHACL validation, etc. If Inrupt becomes a large POD provider, it may also come with legal requirements.
Also, validation may not only be a boolean accept or error, but possibly also filtering to accept valid parts. For v.next, we need to have an architectural element that does this, but we may need to address parts of this problem already for 5.0.0."
Consider this a super-issue for discussing what should be in 5.0.0, and if we should try to make that reusable in v.next. Also, we may discuss if we do not attempt to solve it on the backend and refer to frontends to do it.
We’ve had a meeting about this here in Ghent between @rubensworks and myself. Since this is a detailed code discussion, it belongs here, but there is a broader discussion too, about what exactly we should validate and if we should also do filtering and/or transformation, and that is a discussion we could have on Discourse.
This is what we have arrived at:
We have decided that the architectural implications of filtering built on the proposed validation (i.e. accept/reject) framework are small, and therefore, we decided to go for validation in the first iteration.
We found that we should use the try/catch
system and design a pipeline where accept (resulting in e.g. a 200
response`) is issued if the pipeline doesn’t throw. Since we have not found a typed error library for JS, we figured that modules in the pipeline would add a type attribute to the error object declaring that it is a validation error. The error object would also have a message and an error name(matching its class name, and probably be exposeable as an RDF class).
The calling code (e.g. the HTTP handler) would then catch the error, and by checking the type attribute, it will throw a 400
error. It should include a SHACL Validation report, with the message as an sh:message
.
In the first iteration, the only validation class in the pipeline would be an RDF syntax checker as done in #882. I haven’t studied SHACL very deeply, but it seems like for for example RDF checker, we could do :
What we haven’t yet decided is to configure the pipeline, but that’s on a different abstraction layer.