:: The Format Registry :: The Format Registry

As the story goes, I got home last Thursday. 14 November, 2013; A tweet from the previous day, still on my mind:

Will be easier with Linked Registires...

Will be easier with Linked Registries…

Yes. A lot will be easier with Linked Registries… But it has been a year and two months since I departed The National Archives, UK. A little longer since I blogged about Linky, The Linked Data Hedgehog, and looking back on progress outlined on The National Archives: Labs Pages – two years, give or take, since, I, along with a good colleague, helped to establish an infrastructure that might easily have become a ‘Linked Registry’… so where is it? where are they?

Avoiding the elephant in the room, the greatest, genuine, greatest achievement in the world of format registries this past year has been the Archive Team initiative:

The project began in November 2012, and at the time of writing this blog, the Just Solve the File Format Problem wiki has 2,272 entries created by the community in just one year’s effort, and it’s still going strong.

But it’s not a ‘linked registry’ per se.

So, between 14 November and work beginning 18 November, four days, I started with a blank page on a text editor, a fresh code base, and wrote a linked registry: :: The Format Registry 

Four days.

There’ll be more content on this blog in the coming days/weeks looking at the architecture and what the site does and what it can provide users.

Because of the wish to actually polish this work and make it sustainable, and persistent, the disclaimer on the site reads thusly:

While this notice is up the persistence of the links cannot be guaranteed. There are mechanisms still to be built which will provide guarantee. At present the application reads the data from source, and maps it, but cannot guarantee ‘subject’ mapping. Thus far the work has focused on export, vocabulary, and link redirection.

So, given the other efforts, and given this is another project that ‘isn’t quite there’ why should you still care?


1. It’s open source and ready for you to play with the code and contribute if you like. Or just download it and host something for yourself!


2. The introductory paragraphs to the project outline this as a challenge. What can actually be delivered in short order. Without the burden of the frameworks that one (others) may often have to work within:

Welcome to The Format Registry: A linked data file format registry.

The work is the result of a four-day hack during November 2013. Its goal is to challenge the status quo, to influence the rapid development of further format registries and other linked open data initiatives within the digital preservation community.

The focus of this project will be on the data and the augmenting of what is currently available.

3. Your comments matter… up to a point. I’m going to call for comments on the work (hint: this is that call). Vocabulary, infrastructure, further requirements. I’ll pretty much keep developing otherwise and hope it satisfies requirements. Ideally I’d like to actually satisfy requirements, noted here, via twitter or by email. I’ll have to keep developing anyway, so prompt commentary appreciated.

What else do you need to know?

Data: Currently the site maps the PRONOM dataset which is made available by The National Archives under the Open Government License (OGL). The plan is to look at augmenting this work with UDFR and the Archive Team registry information where possible.

Mapping: Only a subset of PRONOM is mapped. It will probably remain a subset when it is fully curated. At the time of writing, the site uses 15 different predicates/properties.

Linking: The site is minimally linked. That is, there are a handful of seeAlso links where appropriate, e.g.

Hosting: I am currently piggybacking my regular hosting service with The domain is paid up for two-years so links, once static, can persist for that period of time at least.

TODO: A non-exhaustive list of immediate priorities on the GitHub hosting for the project.

Other than that, welcome to Take a look at the intro page. Navigate around a bit. Let me know what you think.