In Fractal in detail: What information is in a file-format identification report? I describe the different ways of dissecting the information in a file-format identification report.
A file-format identification report is a data-rich artifact created during the processing of digital collections.
I had the idea of using this type of report to attach a checksum to an archival collection ‘as-a-whole’, and all of the directories in a collection as well. This is done using methods that I had been reading about in my developing understanding of blockchain and also source control systems such as Git.
The tool I created is called sumfolder1.
I feel the approach may be useful in proving the integrity of collections of files and helping to assert the existence or non-existence of files (or directory groupings) in a hierarchy of objects.
The approach is purely content based and so collections can be analysed even if file or directory names have been changed. This is one of the benefits of checksums for digital files, but as directories don’t have any inherent payload, a heuristic must be used to calculate a checksum using a sum of the checksums of a folder’s contents, or non-contents in the case of empty directories.
It was a fun tool to write, and practically, the tool, or even just the heuristic can be used immediately to provide directory and top-level checksums for collections in archival software. I’d love to see folks use it, but while it’s something I am proud to have developed and made available, I’m not sure the world is calling out for it just yet.
NB. and while the tool uses file-format identification reports, it can also be extended to use the output of tools such as sha256deep, md5sum, and so on.
I have written extensively about the tool in an Open Preservation Foundation (OPF) blog: https://openpreservation.org/blogs/what-is-the-checksum-of-a-directory/
And the README for the tool is also chocked full with info: https://github.com/ross-spencer/sumfolder1
Let me know if you like the concept or how it goes for you if you give sumfolder1 a whirl.