The Skeleton Test Corpus

The Skeleton Test Corpus

The second post to appear on my new and somewhat humble blog is the collection of some initial bits and pieces I am doing in Digital Preservation. Namely, The Skeleton Test Corpus.

There isn’t a lot to say here, a lot would be repetition of the contents of the GitHub that exists for this project, and as such, I’ll post the intro from the wiki pages there and let them do the talking [NOTE: At time of writing (17/10/12), the wiki pages are still a work in progress]

The skeleton test suite provides a mechanism of creating file format ‘shells’, or, skeleton files that test the matching algorithm of the DROID format identification tool and test the integrity and discreteness of DROID compatible format signatures, that is, ensuring a one-to-one (1:1) relationship between a signature and the ‘file-format’ it matches.

Copy and paste the sequence CA FE BA BE into a hex editor, save it, preferably with a meaningful name and a .class extension and the file created will identify in DROID as Java Compiled Object Code, x-fmt/415, (DROID signature file V63)…

I’ve written a formal explanation of the genesis and rationale behind The Skeleton Test Corpus for an upcoming conference. Papers are still being judged so I can’t really say much more but once I can the full write up will appear on this blog, whether it is accepted or not.

I also wrote a little Storify based on tweets about the announcement of the Skeleton Test Suite Generator. As the Storify suggests, comments from everyone are welcome. This blog post seems like an ideal place to collect initial considerations, and hopefully as the work progresses more posts will appear describing its evolution.