Wikidata is a good service, Wikibase (on which Wikidata is built) is a better platform.
I have spoken before about its potential to be added into the file-format registry ecosystem in a federated model.
If we are to use it as a registry that can perhaps complement the pipelines going into PRONOM, e.g. in vendor’s digital preservation platforms such as the Rosetta Format Library, a Wikidata should be able to output different serializations of signature file for tools such as Siegfried, DROID or FIDO.
In 2019 I was staying in Hamburg for a short amount of time while I was waiting for my Canadian visa. It was around the time of the iPRES digital preservation conference and I was only a few hours away. While I hoped my work would send me it was not going to be my year. I then hoped I would take myself there anyway by writing a computer game; inspired by Board Games sessions in 2017 and 2018. Alas, that didn’t work out either, but I did end up with a piece of work I am still very proud of.
With thanks to the sponsorship of Archives New Zealand and Richard Lehane for his great coding expertise and his collaboration; Demystify Lite has a new feature — Siegfried!!
Richard recently posted about this work on LinkedIn but lets look at this effort in more detail below.
In my post from 2012: Genesis of a File Format, I created a new file format – the Eyeglass file format. The format provides a mechanism to persist information about a patient’s eye health following a checkup at an opticians. Today in 2023 we can use the format to understand how to make use of Kaitai Structs for understanding file formats.
Given the disclaimer that I am not actually an optician and that the format is purely illustrative, let’s look at the eyeglass again below.
I have been working on a Python template repository as part of my day-job at Orcfax.
It is based on the popular pypa sample project and adds important tooling that supports the quality assurance of projects that many developers are expected to engage with.
In my template repository I add editor defaults, linting, and prepare the repository for unit tests, and then deployment.
I have migrated a copy of the template I created for Orcfax to a new file format organisation I have created to capture work I am doing around tools such as ffdev.info (the PRONOM signature development utility).
I want to talk about how this tooling can be used as a way of understanding legacy, or new code that you are going to be looking at. Looking at how linting can be useful for learning and understanding.
A file-format identification report is a data-rich artifact created during the processing of digital collections.
I had the idea of using this type of report to attach a checksum to an archival collection (files, and directories) as a whole. This is done using methods akin to a Merkle Tree, similar to those in source control systems such as Git, and Web3 Blockchain projects like Bitcoin.