Not long after my first Code4Lib article I had another idea to run by the team there, and elected to see if my paper looking at events in the PREMIS metadata standard would be of interest to them and the readership.
My paper PREMIS Events Through an Event-sourced Lens was published April this year.
The prologue to the paper probably describes it best:
The PREMIS metadata standard is widely adopted in the digital preservation community. Repository software often include fully compliant implementations or assert some level of conformance. Within PREMIS we have four semantic units, but “Events”, the topic of this paper, are particularly interesting as they describe “actions performed within or outside the repository that affects its capability to preserve Objects over the long term.” Events can help us to observe interactions with digital objects and understand where and when something may have gone wrong with them. Events in PREMIS, however, are slightly different to events in software development paradigms, specifically event driven software development – though similar, the design of PREMIS event logs does not promote their “being complete” nor their consumption and reuse; and so, learning from logs in event driven software development, may help us to simplify the PREMIS data model; plug identified gaps in implementations; and improve the ability to migrate digital content in future repositories.
It was inspired by my interest in event-driven software practices. While it’s not a paradigm I have had the opportunity to implement yet, it is a paradigm I have studied in previous roles and still consider a potentially important way of describing the life of digital objects.
Event sourcing is a software development methodology that provides a mechanism of persisting data that focuses on recording what has happened rather than how things are; rather than storing current state, event sourcing maintains “state mutations” as separate records called “events” — Alexey Zimarev https://www.eventstore.com/blog/what-is-event-sourcing
Events in the software development methodology are stored on a rolling basis and events are first class citizens of the systems that implement them. Events only describe what has happened so analyzing impact is usually done by processing the events at a later date, e.g. to understand the number of widgets in stock on a sales platform you would process all the events one by one, and increment or decrement the total number for each replenishing or sales event that you come across.
PREMIS (events) sit somewhere between the event-sourced paradigm and CRUD (Create, Read, Update, Delete). They feel much more static in the systems that use them and their representations are also fairly static (XML is about as structurally fixed as you can imagine). “Events” have to be consciously written into processing routines, and deciding when to output an event is an act of design (and requires its existence in the PREMIS data dictionary). PREMIS tends to be stored and is rarely looked at after a file has been processed into some sort of archival package.
With a few changes to how the PREMIS standard is implemented I feel it would be possible to make the standard much more dynamic.
The paper was fun to think about and write, and I hope you all get the opportunity to read through it.
There hasn’t been a huge wave of comment on the paper yet, and I’d love to see and engage in a bit more.
The paper is the result of a good number of years now looking at preservation metadata and how it is recorded and trying to find ways of simplifying it to provide more incentives to make it more complete, and to find compelling ways of consuming and representing it to the end user.
While some of the suggestions in the paper require some quite big changes there are also some suggestions that I think help modernize how PREMIS achieves its goal – the discussion around conformance, for example, was an important one for me to reflect on. Conformance in PREMIS means capturing event data, but at the highest level of conformance this data needs to be in a PREMIS “schema” that doesn’t require further “mapping or conversion”.
In a world where data can be represented many different ways and indeed, many developers want to be able to access data in many different representations, aligning structure with conformance is a mistake. Emphasis should be placed on incentivising collecting and storing event data and ensuring data can be output in a PREMIS compatible way, but not forcing the PREMIS representation to exist. This at the very least doubles the data footprint, or doubles the processing efforts if developers want to convert PREMIS into something else.
A similar description exists in a paper by Rob Sharpe of Tessella from 2013: https://www.dpconline.org/docs/miscellaneous/events/850-premismets-sharpe/file and I think we’re fairly aligned, but I would go further and not impose PREMIS at the database schema level at all and instead create a logical schema to be applied when data is output — the same way content negotiation is used in linked open data. You could imagine a cURL request for all information about a collection ‘X’:
curl -H 'Accept: application/premis+json; \
I want to thank Rob for his 2013 thoughts as they have always been in my mind when thinking about the different repository solutions I have worked on.
I am sure there are some opinions about my Code4Lib article on PREMIS events through and event sourced lens. I’d love to hear them! Part of my process of improving the solutions I work on is putting this type of publication out there and listening to and learning from the comments to understand what type of course corrections I may need to make.
Read more in Code4Lib issue 56: https://journal.code4lib.org/articles/17264
There’s one challenge that the paper doesn’t make explicit, and that is the question, “how dynamic is digital preservation?” I attempt to describe a very dynamic approach to maintenance of digital objects using event sourcing and the PREMIS model. I look at events as living, constantly happening things but what if our field is less dynamic than that? It may very well be, but then, I don’t think that conversation is being had anywhere. If digital preservation is not a dynamic field then I think we can start to look at simplifying repository software and our records about digital objects in different ways. That’s another conversation I’d love to have somewhere down the line.