Binary Trees? Automatically Identifying the Links Between Born-digital Records

I recently presented a summary of a paper I am trying to see published in the Australian Society of Archivists Journal, Archives and Manuscripts, at their annual conference, their theme, Forging Links.

The paper asks how Archives can take small steps toward better archival description of born-digital records, but the techniques discussed also have implications for sentencing, discovery by the end user, and digital preservation.

The abstract reads as follows:

The sheer volume of records that government organisations, and thusly government archives, work with on a daily basis means that there is a chance that relationships between individual records will not easily be captured and recorded. This paper begins by suggesting that the relationships described in archival catalogues will remain at the highest levels of abstraction unless relationships can be extracted using automated methods. Relationships that can be generated automatically are described in this paper but will likely be less canonical than archivists are traditionally used to working with. For example a so-called ‘fuzzy matching’ technique is discussed that may reveal the ‘points’ similarity between two records. Extensible databases will be needed to store new links; flexible interfaces will be required to display them. This paper discusses some of the techniques that may currently be available for automatically identifying links between born-digital records by looking at what can be found in the data stream and the relationships digital formats inherently describe. The mechanisms described may be useful for sentencing as well as cataloguing and description. While one size will not fit all, some collections may benefit. The paper concludes by discussing briefly what this will eventually mean to the end-user.

It was a useful experience presenting to an audience of talented archivists and it gave me a lot to think about for improving the final draft of my paper before submission. Submitted a handful of weeks ago it is hoped those improvements will help to see it through to being published. We’ll see how those results look in the coming months.

Before then, the slides summarising the paper can be found on SlideShare:

And a video of the presentation here:

