A disused railway line at the Don River Valley in Toronto

Published: Archives and Manuscripts, Binary Trees

This week (beginning 7 August 2017) marks my second solo published peer-reviewed paper. Binary trees? Automatically identifying the links between born-digital records. I invite everyone to have a read and let me know what you think.

The paper won the Sigrid McCausland Emerging Writers Award late in 2018.

Read the paper and additional thoughts below.

Archives and Manuscripts Preprint: Binary Trees by Ross Spencer

The background of this paper was a difficult one following the 2016 Kaikoura Earthquake. The first of its magnitude I’ve experienced. Still, I am reasonably happy with the outcome and it resulted in a new suite of tools to advance parts of the paper further.

NB. the photo for this blog was taken in Toronto when I visited to present a poster about the HTTPreserve suite at Web Archiving and Digital Libraries (WADL2017: Poster).

I presented extracts from the eventual paper at the ASA Forging Links Conference in 2016; you can see my talk here: https://www.youtube.com/watch?v=Ked9GRmKlRw .

Following the conference it took a while to take a very rough draft and turn it into something more substantial but we got there.

It was good to get the experience of trying to write academically. I want to thank colleagues at Archives New Zealand (Helen and Talei especially) for helping me to shape the final work, as well as the Archives and Manuscripts editors. The review comments also helped push the paper in the right direction.

You can read some of my early thoughts here.

If you have any thoughts on the outcome I’d love to hear them. If you’d like to explore turning some of the ideas into something more then I’d love to work with you on something. Thanks in advance for reading!


Update 2028-09-15: Late in 2018 I was informed the paper had won the 2018 Sigrid McCausland Emerging Writers Award with the following note from the Journal’s Editor:

This innovative article is widely relevant to the archival profession addressing a major challenge of digital records and their archival management and uses.

In this article, Spencer looks at the practical challenges facing archivists in the struggle to capture and preserve authentic, reliable, and unique records as sources of evidence. To identify and protect records in a world awash in duplication, the author outlines eight key record-to-record relationships that archivists must address and explains clearly and concisely several different technologies that may be applied to identify and describe those relationships. This article makes plain the complexity of digital archives and offers pragmatic approaches to support effective preservation.

This article helps to build capacity in the archival profession by communicating simple approaches to identifying links between items, and its focus is on open source tools.

It does a great job explaining the computational terminology and techniques that could be valuable to archivists to enhance the accessibility of digital records.

It feels unreal to receive such recognition for my work but I appreciate the comments of the editor and the interest taken in this paper.


Update 2024-12-06: Checkout this great article in Preservation, Digital Technology & Culture from St John Karp: The Interconnectedness of All Things: Understanding Digital Collections Through File Similarity. It takes a similar approach adding perception hashing to the conversation, as well as pointing to some useful looking tooling developed for the publication to allow digital archivists to better work with fuzzy hashing to find connections between objects.

Update: 2024-12-06: Also check out Tim Allison’s work Embedded Files: Risks, Challenges and Options. Tim Allison is Tika’s primary maintainer and developer and the presentation covers the risks and challenges of embedded objects in digital files, which Tika is especially well placed to extract and provide us a way to extract and think about them. 

Loading

Leave a Reply

Your email address will not be published. Required fields are marked *

Follow

Get every new post delivered to your Inbox

Join other followers: