Skyline from a train station in Sydney, Australia

Binary Trees? Automatically Identifying the Links Between Born-digital Records

I recently presented a summary of a paper I am trying to see published in the Australian Society of Archivists Journal, Archives and Manuscripts, at their annual conference, their theme, Forging Links.

The paper asks how Archives can take small steps toward better archival description of born-digital records, but the techniques discussed also have implications for sentencing, discovery by the end user, and digital preservation.

The abstract reads as follows:

The sheer volume of records that government organisations, and thusly government archives, work with on a daily basis means that there is a chance that relationships between individual records will not easily be captured and recorded. This paper begins by suggesting that the relationships described in archival catalogues will remain at the highest levels of abstraction unless relationships can be extracted using automated methods. Relationships that can be generated automatically are described in this paper but will likely be less canonical than archivists are traditionally used to working with. For example a so-called ‘fuzzy matching’ technique is discussed that may reveal the ‘points’ similarity between two records. Extensible databases will be needed to store new links; flexible interfaces will be required to display them. This paper discusses some of the techniques that may currently be available for automatically identifying links between born-digital records by looking at what can be found in the data stream and the relationships digital formats inherently describe. The mechanisms described may be useful for sentencing as well as cataloguing and description. While one size will not fit all, some collections may benefit. The paper concludes by discussing briefly what this will eventually mean to the end-user.

It was a useful experience presenting to an audience of talented archivists and it gave me a lot to think about for improving the final draft of my paper before submission. Submitted a handful of weeks ago it is hoped those improvements will help to see it through to being published. We’ll see how those results look in the coming months.

Before then, the slides summarising the paper can be found on SlideShare: http://www.slideshare.net/RossSpencer/binary-trees-automatically-identifying-the-links-between-borndigital-records

And a video of the presentation here: https://www.youtube.com/watch?v=Ked9GRmKlRw&t=614s

Feedback and comments appreciated below.

For more information on the Australian Society of Archivists, check out their website here: https://www.archivists.org.au/

Loading

4 thoughts on “Binary Trees? Automatically Identifying the Links Between Born-digital Records

  1. Interesting presentation thanks Ross. More automated processing of digital records to enable better discovery is so needed . Haven’t made it through the whole thing yet, but it looks like you may not be aware of this tool I led the development of at Archives NZ: http://openpreservation.org/blog/2012/02/01/dependency-discovery-tool-office-files-code-published/

    “The “Office Dependency Discovery Tool” searches through binary office files (.doc, .xls and .ppt) and tries to find any documents or files that are linked to the document.”

    http://openpreservation.org/blog/2012/02/01/dependency-discovery-tool-office-files-code-published/

    Cheers,

    Euan

  2. Interesting. Looking forward to read the improved paper.

    Twitter: @redundanton

Leave a Reply

Your email address will not be published. Required fields are marked *

Follow

Get every new post delivered to your Inbox

Join other followers: