Open Data Archives - ross spencer :: exponentialdecay.digipres

Image from July 2021 depicting the fields around Ravensburg in Southern Germany. There is a sign with some graffiti on which depicts a car sliding off the road, presumably because there is very little curb and it's likely you will career into the grass if you're not careful.

When you can’t pay for things the currency of payment is psychic…

Contributing back to the commons in digital preservation hasn’t been for everyone.

We know the famous XKCD that touches on the underappreciated work of maintainers in obscurity. When you, or your institutions, or services are using free and open source software, or other information and data in the commons, and you’re not contributing back, you’re perpetuating this, and what’s more, there’s a virtuous cycle that we’re missing out on.

I read something the other day and it felt like a red flag.

The Painter Goblin becomes corporeal by having its prints converted from digital to canvas in real life. In this image, the Painter Goblin canvases arer bathed in sunlight provided by a west-facing window around sunset. The grid used to display the Painter Goblin in a salon style shadowed by the window frame onto the wall. The light in this image has been enhanced to increase its saturation to mirror the vibrancy of The Painter Goblin's original image.

The Painter Goblin: Becoming Corporeal

When you move country you have to be prepared to change quite a lot about your life. Back at the end of 2020, apart from literally everything else going on my partner and I also moved from Canada to Germany.

For me, this was my fifth or so international move (including shorter temporary stays) in as many years.

Being able to pick up sticks and move like that means living a drastically minimized life. Most of the things you have fit in a suitcase. Most of the things you have are small, and largely not overly whimsical. Sure, you can fit a few treasures into your bag, but you learn to value small ones, not things you might otherwise use to decorate an entire apartment!!

So, what do you do when you do have an apartment to decorate?

You ask the best known painter in your family to conjure some magic, The Painter Goblin!

Turning NASA Wake-up Calls into data

For a while back then I was into space flight again. Scientists, science communicators, and engineers were all excited for a new era of rocket launches and the potential unification of the human race as we look towards the future.

During that time I discovered Colin Fries’ work in the NASA History Division to document all NASA “Wake-up calls”. A wake-up call is simply a piece of music used to wake astronauts on missions, a different piece of music, daily, for the duration of the flight.

Take, for example, the last Space Shuttle mission (Space Transportation System) STS-135; it was in flight for 13 days, and the wake-up call on day one was Coldplay’s Viva la Vida, while on day 13 it was Kate Smith singing God Bless America.

As a huge music buff who has the radio or music television on 18 hours a day, I really wanted to delve into this further. While Colin’s work is great, it’s just a PDF file (@wtfpdf). A PDF is not an ideal file format for querying data and gleaning new insights. So, while I wanted to explore it, I first decided to turn it into a true dataset. The result was a set of resources, a website, a JSON, a CSV, and an SQLite database which are each more functional and more maintainable over time.

Lets take a look at the results and https://nasawakeupcalls.github.io below!

Image of the foundations of a new building being erected in Wellington New Zealand, circa 2017.

File format building blocks: primitives in digital preservation

A primitive in software development can be described as:

a fundamental data type or code that can be used to build more complex software programs or interfaces.

– via https://www.capterra.com/glossary/primitive/ (also Wiki: language primitives)

Like bricks and mortar in the building industry, or oil and acrylic for a painter, a primitive helps a software developer to create increasingly more complex software, from your shell scripts, to entire digital preservation systems.

Primitives also help us to create file formats, as we’ve seen with the Eyeglass example I have presented previously, the file format is at its most fundamental level a representation of a data structure as a binary stream, that can be read out of the data structure onto disk, and likewise from disk to a data structure from code.

For the file format developer we have at our disposal all of the primitives that the software developer has, and like them, we also have “file formats” (as we tend to understand them in digital preservation terms) that serve as our primitives as well.

"Bei der Buche", a landscape architectural installation by landscape architect and photographer Karina Raeck. Created in 1993 in the Wartberg area north-east of Stuttgart.

wikidata + mediawiki = wikidata + provenance == wikiprov

Today I want to showcase a Wikidata proof of concept that I developed as part of my work integrating Siegfried and Wikidata.

That work is wikiprov a utility to augment Wikidata results in JSON with the Wikidata revision history.

For siegfried it means that we can showcase the source of the results being returned by an identification without having to go directly back to Wikidata, this might mean more exposure for individuals contributing to Wikidata. We also provide access to a standard permalink where records contributing to a format identification are fixed at their last edit. Because Wikidata is more mutable than a resource like PRONOM this gives us the best chance of understanding differences in results if we are comparing siegfried+Wikidata results side-by-side.

I am interested to hear your thoughts on the results of the work. Lets go into more detail below.

René Magritte's The Lovers, Paris 1928 (Photographed at MoMA, NYC in 2017

Unrealized ideas: Unintentional Secrecy in the Era of Openness

Tyler recently posted this quote:

“History unprocessed is opportunity unrealized”

It reminds me of an unrealized article I wasn’t able to get written and into the wild, but it’s an important thought I would like to share nonetheless.

Proposed for James Lowry’s ACARM Symposium in 2015, I wanted to discuss when government is unable to adequately fund day-to-day effort, and research and development in the archive sector, leading to inefficient and potentially ineffective processing pipelines for records of archival value accessioned from government agencies and commissions.

It was just an abstract, but maybe folks have thoughts about this? Have we moved on since the early to mid 2010’s? What modern metrics do we have available to us today to see the progress? What does the advent of the new US administration mean for issues like this? As well as increasing worldwide authoritarianism?

Making DROID work with Wikidata

Wikidata is a good service, Wikibase (on which Wikidata is built) is a better platform.

I have spoken before about its potential to be added into the file-format registry ecosystem in a federated model.

If we are to use it as a registry that can perhaps complement the pipelines going into PRONOM, e.g. in vendor’s digital preservation platforms such as the Rosetta Format Library, a Wikidata should be able to output different serializations of signature file for tools such as Siegfried, DROID or FIDO.

Siegfried ✅: https://github.com/richardlehane/siegfried/wiki/Wikidata-identifier
Fido ❌: I’ll need to revisit this!

And what about DROID?

PRONOM release statistics

My contribution to PRONOM research week 2023 (held in November 2023) is a PRONOM summary website and Application Programming Interface…

Using a custom Wikibase with Siegfried

In March I was invited by the LD4 Wikidata Affinity Group to talk about my experiences using Wikibase with Siegfried, the file format identification tool. I don’t think I’ve talked about that work on here before but you can find links to my iPRES talk on my ORCID page.

Let’s look at the abstract and the content of the talk below.

Published: The Hedgehog Review! But not like you nor I imagined…

May 31 I woke to a surprise, an unanticipated message about one of my images from Leann Davis Alspaugh, the…

Tag: Open Data

When you can’t pay for things the currency of payment is psychic…

The Painter Goblin: Becoming Corporeal

Turning NASA Wake-up Calls into data

File format building blocks: primitives in digital preservation

wikidata + mediawiki = wikidata + provenance == wikiprov

Unrealized ideas: Unintentional Secrecy in the Era of Openness

Making DROID work with Wikidata

PRONOM release statistics

Using a custom Wikibase with Siegfried

Published: The Hedgehog Review! But not like you nor I imagined…

Follow ross spencer :: exponentialdecay.digipres :: blog