Coding Archives - Page 2 of 3 - ross spencer :: exponentialdecay.digipres

Making DROID work with Wikidata

Wikidata is a good service, Wikibase (on which Wikidata is built) is a better platform.

I have spoken before about its potential to be added into the file-format registry ecosystem in a federated model.

If we are to use it as a registry that can perhaps complement the pipelines going into PRONOM, e.g. in vendor’s digital preservation platforms such as the Rosetta Format Library, a Wikidata should be able to output different serializations of signature file for tools such as Siegfried, DROID or FIDO.

Siegfried ✅: https://github.com/richardlehane/siegfried/wiki/Wikidata-identifier
Fido ❌: I’ll need to revisit this!

And what about DROID?

Client-side file format identification and reporting pipeline with Siegfried and Demystify Lite

With thanks to the sponsorship of Archives New Zealand and Richard Lehane for his great coding expertise and his collaboration; Demystify Lite has a new feature — Siegfried!!

Richard recently posted about this work on LinkedIn but lets look at this effort in more detail below.

iPRES2024 header for DESIGN PATTERNS IN DIGITAL PRESERVATION: DECLARATIVE SOFTWARE FOR DIGITAL PRESERVATIONISTS

Not your first paper from iPRES2024: Design patterns in Digital Preservation: Declarative software for digital preservationists

Well folks, my paper for iPRES2024 was rejected. but the good news is that you get to read it here…

PRONOM release statistics

My contribution to PRONOM research week 2023 (held in November 2023) is a PRONOM summary website and Application Programming Interface…

Shattering the eyeglass: Using Kaitai Structs to dissect the eyeglass’ contents

In my post from 2012: Genesis of a File Format, I created a new file format – the Eyeglass file format. The format provides a mechanism to persist information about a patient’s eye health following a checkup at an opticians. Today in 2023 we can use the format to understand how to make use of Kaitai Structs for understanding file formats.

Given the disclaimer that I am not actually an optician and that the format is purely illustrative, let’s look at the eyeglass again below.

Stop, Look, Listen, retro game style advertising for safety at a Houston Bus Stop

Linting as understanding

I have been working on a Python template repository as part of my day-job at Orcfax.

It is based on the popular pypa sample project and adds important tooling that supports the quality assurance of projects that many developers are expected to engage with.

In my template repository I add editor defaults, linting, and prepare the repository for unit tests, and then deployment.

I have migrated a copy of the template I created for Orcfax to a new file format organisation I have created to capture work I am doing around tools such as ffdev.info (the PRONOM signature development utility).

The new template repository can be found here: ffdev-info/template.py.

I want to talk about how this tooling can be used as a way of understanding legacy, or new code that you are going to be looking at. Looking at how linting can be useful for learning and understanding.

Moonshine: a small part of the file format analyst’s toolkit

Today I released Moonshine 2.0.0. Moonshine is a a file format discovery tool I developed a few years ago. A…

Programming things: Giving up… (or at least getting bitten by semver and Golang’s unforgiving nature, and wanting to!)

There are good days, and there are bad days when coding, and you never stop learning. Today was not a…

The Painter Goblin: Part 4, Putting it all together…

Following the previous posts, bringing this all together meant three different applications.

paintergoblin.py – creates the images, can be run standalone
wikigoblin.py – retrieves data to tweet from the Wikidata SPARQL services
twittergoblin.py – tweets for us! Either a random Wikidata image or from am existing Wikidata link

We create Tweetable information using the wikigoblin. We perform the Tweet using twittergoblin. In between the paintergoblin has to create his art!

We’ve seen examples of the images from the original zine.

How do we turn this concept into something real, and automated?

The Painter Goblin: Part 3, Data Sources

One thing that held the Painter Goblin project back was finding a data source to get images from.

There are potentially hundreds of sources out there, but! The path of least resistance means that:

Any source needs either hackable URIs** (uniform resource identifier) or a randomizing function.
Ideally, a data source doesn’t link to yet-another-page, e.g. portal like websites to other’s collections.
Ideally the data source links directly to an image to download.
Data can be easily selected by category, e.g. just paintings, or posters, not just ‘art’.

** A hackable URI is a URI pattern that can be cycled through using computational techniques, even if the underlying data isn’t entirely well-known. E,g, http://example.com/image/0001, http://example.com/image/0002, for subsequent pages, for lack of a more concrete example.

I wanted to explore heritage sources such as Europeana, TROVE, DPLA. I struggled to search these effectively though, and struggled to see how I might automate using them. I recognise they have APIs. I’ll revisit them in the future as I look to expand the Painter Goblin’s corpus.

Enter Wikidata.

Tag: Coding (page 2)

Making DROID work with Wikidata

Client-side file format identification and reporting pipeline with Siegfried and Demystify Lite

Not your first paper from iPRES2024: Design patterns in Digital Preservation: Declarative software for digital preservationists

PRONOM release statistics

Shattering the eyeglass: Using Kaitai Structs to dissect the eyeglass’ contents

Linting as understanding

Moonshine: a small part of the file format analyst’s toolkit

Programming things: Giving up… (or at least getting bitten by semver and Golang’s unforgiving nature, and wanting to!)

The Painter Goblin: Part 4, Putting it all together…

The Painter Goblin: Part 3, Data Sources

Follow ross spencer :: exponentialdecay.digipres :: blog