Don’t implement PREMIS (re)present it
I got a response to my paper PREMIS Events Through an Event-Source Lens.
There are two strange choices made by this response. I’ll touch on the more personal one at the end, but first, what does the response say?
It’s not entirely clear.
If the response says that, “it is a choice to implement PREMIS?” And that “PREMIS can be implemented in different ways?” “and that it’s technology agnostic” Then yes, 100% that’s basically the driver for my original paper and once you read it holistically, instead of dissecting it and cherry-picking points, you will probably read it that way as well.
As I wrote in my first blog response to the publication of my paper in 2023, Tessella’s Rob Sharpe’s 2013 presentation was an important reference point for me and we’ll revisit it below, but Rob labors that PREMIS is technology agnostic and can be represented in other formats, and since 2013 I haven’t seen enough conversation or discussion about that, and I wanted to amplify that message by looking at PREMIS in an event-sourced model as an aggregation.
If there’s something more substantive in the PREMIS Editorial Committee’s (EC) response, then I feel it’s lost in its own stylistic choices (to focus on what I might have been saying rather than taking a show don’t tell approach to clarifying their more salient points.).
I wonder if it might have been handled differently? I am pretty easy to find these days, and so reaching out to clarify any of my thinking might have been one way; perhaps there was a way to collaborate on a response; perhaps most of of the EC’s concerns (if there are any) could have been handled with a joint editorial note in the original paper to clarify that my words are not an authoritative source on PREMIS, rather, PREMIS (events) were largely a vehicle to describe more the benefits of an event-sourced architecture and that you still need to consider and interpret the PREMIS documentation and guidance for yourself before implementing it in your own solutions.
Going a different direction
The essence of the original paper is this: (from my perspective) PREMIS is not a schema to be implemented in the back-end of any digital preservation system. Should it be still be deemed a relevant technology, it might be studied in your requirements analysis, and you would make sure that your own system is not lossless in any way as to effect PREMIS “conformance”, but you would not match your “schema” to PREMIS, you would ensure that you can output it, “present it” that is, it would become one representation of data that can be generated from your system out of many. One view, or as I clearly point out, an aggregation, in the case we have chosen an event-based architecture.
This is not at odds with the (so-called) corrections that have been provided to me in the Code4Lib journal article from the PREMIS EC.
That being said, a further thesis is that PREMIS events are often a lossy, stateful representation of data in a digital preservation system. PREMIS represents one-dimensional state (or slices of state) over a period of time. In the modern engineering world, we have at our disposal methods of capturing, greedily, all events in the life of a digital object and doing that will create a richer view of the life of that object, and, as a representation of that data, a richer PREMIS view of an object and its events over time if so desired.
The authors of the EC response labor heavily on their perception of a misunderstanding on my part about PREMIS and they can choose to do that but what may look like a misunderstanding of PREMIS is not a misunderstanding of technology:
Conformance, in general, is defined as:
> how well something, such as a product, service or a system, meets a specified standard
And the PREMIS EC have decided to attach levels to conformance (also graduated levels, and degrees) to “quantify(ing) the degree to which PREMIS has been implemented”, three of which are anchored in implementation, apparently, three distinct implementations.
- Mapping, indirect or otherwise,
- Export,
- Direct implementation,
I write:
PREMIS conformance should be separate from representation. If we acknowledge PREMIS is at least one important representation of preservation metadata, i.e. for its ability to act as an interface to those looking to interpret preservation metadata, then whether it exists logically on disk, or is generated through an event sourced projection, is irrelevant. How a representation complies with the PREMIS data model remains of greater importance, but this is measured from the same eventual view, whatever intermediate abstraction it sits within.
The PREMIS EC can choose to have three graduated levels of implementation to quantify degree of implementation. They can also make it clear level three (internal representation) is not necessarily the final goal, but it might benefit you; but If you’re not the PREMIS EC, don’t go near it, there’s no need.
I posit that conformance is only how well you can map to PREMIS or access something PREMIS-like that satisfies its data model. Your goal is to look at PREMIS as one interface you can potentially satisfy (you still need to describe objects uniquely; you need to describe agents engaging with them; rights need to sit somewhere), and once you can satisfy that interface you can access it in many different ways, and conformance should be measured against that, if PREMIS conformance is deemed valuable.
Put simply, conformance does not require levels. Levels may simply be the wrong word, these are just guides you might follow to demonstrate conformance (or ways that someone might audit a system to determine conformance).
The EC clipped this from one of the points they responded to:
Is level three (internal implementation) reasonable in today’s software development world, is it reasonable in today’s environmental climate?
Do we sacrifice the potential to store and access other different, richer, more-complex, (or less-complex), representations about other cross-sections of our data at the expense of putting PREMIS at the core of our digital preservation system? – No. We can make it an output of many, and use its schema and data dictionary to output it, but we don’t build around it, we essentially report around it.
They argue:
there are also benefits in choosing to take an internationally defined and agreed data model and use that as the basis of your system.
Well, if it’s internationally defined and agreed, let’s just do that! 🤷
The benefits of not implementing an external data model are broadly around increased control and flexibility, however the trade-off to consider is the likely loss of easy interoperability and exchange with other systems.
If you re-frame PREMIS as an interchange-format and you can prove that as useful, you absolutely have my buy-in and I will have designed you a system that doesn’t preclude a PREMIS-like output, i.e. a way of aggregating more detailed information in your system and outputting PREMIS as a representation (a format) for others to understand.
The resurgence of OAIS?
From the EC:
There are two responses to this, the first is to note that access has always been considered a part of Digital Preservation, to the point that one of the functional areas of the OAIS model is Access.
Who had OAIS on their World Digital Preservation Day (WDPD) Bingo Card?
But also, no. This is a misleading read and deserves more context.
Access when it is considered part of digital preservation is when access is used as a measure of success of digital preservation (or indicator of the potential obsolescence of an object) – it is an intrinsic property of digital preservation.
But the access function in OAIS is not that. And even if you’re crafty, and build an access component to a system that provides a feedback loop to digital preservation functions, it’s not that part of OAIS.
Now, PREMIS does have some nice features that support access BUT we’re talking “events”, and information that supports digital preservation and even though there may be a way to encode events that provide a feedback loop to measure the success of preservation, e.g. {“event”: “access”, “detail”: “tried to open PSD in GIMP”, “outcome”: “FAIL”}, true access goes well beyond the scope of my article and the spirit in which it was written.
We need to evolve
The EC presents a somewhat dogmatic and institutionalised response. As a flaneur in the field, as someone who has worked implementing PREMIS in one of the most PREMIS heavy digital preservation systems out there, and involved too in efforts to minimise PREMIS verbosity, including my own event-like approaches I revisit Sharpe’s paper in 2022/2023. I do this asking, why don’t we talk about it more? Why do I see projects today still see XML as the end goal of PREMIS?
- https://github.com/bishbashbackup/premissh
- https://github.com/rochester-rcl/premis-generator (also JSON which is really nice!)
My view is that a 20 year old standard, a 2015 specification (last revision) and a 2016 reference implementation in an out of date technology (XML), and an very institutional PREMIS EC, with roots at the Library of Congress, all have influence, and some of the points I do see appearing from their response are being buried in their desire to hold onto authority.
The biggest point being buried, technological agnosticism, appears in the EC’s response to me five times, technology independent once, and in the official data dictionary once (unrelated), and it appears in the official 2015 conformance statement, zero (although you can bend the verbosity of the conformance statement into words that read like technologically agnostic. But make it explicit, don’t write it five times to me and not put it in the docs. Make new reference implementations, or borrow them from your implementers. Use plain-language, and just make it explicit.
Better still, let’s evolve the presentation of the PREMIS standard (away from separate PDFs), and use a modern documentation framework (e.g. Diataxis), and put it into public versioned source control, and give us a way that we can help write the documentation with you to make things like this clearer.
While the EC’s response to me labor on the idea I have missed the fact that PREMIS is technology agnostic I wrote the original paper to amplify previous conversations and keep them relevant because they were formative for me, and I hope that they will be formative for others.
I also wrote the original paper as more of a technology paper than a PREMIS paper (honouring PREMIS of course) but I make a very clear conclusion that is very much inclusive of PREMIS:
It is this paper’s assertion that we can store more, and “do more” by taking an event-sourced approach to storing events associated with the “objects” described in the PREMIS data dictionary.
I can nuance this further:
- Store events about your digital objects and try to make sure some of those events can be aligned with PREMIS,
- Store events because events happen on a continuum, don’t fall into the trap of storing state,
- Create representations of your data, PREMIS might be one, access reports and logs might be another, feature analyses might be another, don’t limit yourself to one schema, use many.
My paper is about trying to fit older trusted paradigms into modern development practices. It’s about moving away from dogmatic adherence to the past while honouring something that exists.
We can do PREMIS exactly the same as we do it now, as long as we don’t put it front and centre of our implementation.
How to respond to a “well-actually”?
Well-actually… https://www.recurse.com/social-rules#no-well-actuallys
There are some editorial quirks in my paper, the one I am most embarrassed by is when my writing conflated the data model with the events in the Library of Congress controlled vocabulary (what other controlled vocabularies have other folks been using in the last decade? Next PREMIS revision, please, put those listings in there or open the editorial process to modern practices). Conflating these two things in one paragraph should hardly be the thread that untangles the entire piece.
The PREMIS EC haven’t reached out to me before publication, or after, yet as I point out, they all know where to find me (I wasn’t able to make the PREMIS birds-of-a-feather at iPRES (probably a good thing while this seems to have been in the air) but I was at the conference). Their response though does something strange, directing their efforts at things I might not have understood, may seemingly be getting at; or pointing out what I am “really saying here”. It is a patronising approach. For the gaps they filled in on my behalf, I would happily have provided clarity, offering me the opportunity to respond in a less reactive way, or perhaps all of us a chance to collaborate.
Their response appeals to authority, and its two references are my article and the PREMIS data dictionary. I am sure there was a more neutral, reflective, and holistic way to approach this work by focusing on the entirety of the article and its spirit, and giving the benefit of the doubt to what is perceived as the author’s “mistakes” or “misreadings”. A show don’t tell approach might have helped, and would certainly be valuable, e.g. spending more time implementing examples that lent themselves to updating future revisions of the data dictionary and conformance statements.
¯\_(ツ)_/¯
Anyway folks. ¯\_(ツ)_/¯ Interpretation is tricky? I imagine that the PREMIS EC will find fault with the above text, but to try to avoid another article on the subject of my misinterpretation: The PREMIS EC aren’t foisting the standard on you and I most definitely am not. Read their docs if you do choose PREMIS. Technology changes and so do standards. I feel we have an obligation to modernise (and demonstrate modernisation) with those changes. I feel we have an obligation to question, and evaluate as time moves on; especially when technology is front and centre of how we support our archivists and librarians.
Hopefully people reading this can continue to read the original paper for what it is. There may be some potentially interesting ideas and conclusions that a pure PREMIS discussion distracts from, including what event-sourced data might mean for activating information supporting digital preservation.
Hopefully too, from this engagement, the PREMIS EC will take an opportunity to fold some of their own response into their own documentation and guidance.
Thanks for reading.
PREMIS conformance statement (2015): https://www.loc.gov/standards/premis/premis-conformance-20150429.pdf
PREMIS data dictionary (Version 3.0 (2015)): https://www.loc.gov/standards/premis/v3/premis-3-0-final.pdf
@exponentialdecay hey ross, I got very confused because the link in your post for "response" goes to your original code4lib article, not to the response!
@richardlehane @exponentialdecay oops! Fixed it, thanks Richard! (also it was here: https://journal.code4lib.org/articles/18203)
@beet_keeper @exponentialdecay I enjoyed the spicy takes 🍿We should make an event-sourced (or log based https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying) digi pres system sometime
@richardlehane @exponentialdecay 💯
@richardlehane @exponentialdecay I always think about ERMS when I start to think about this, so I get a bit stuck about where to start but I do think it'd be cool!
@beet_keeper @exponentialdecay that doesn't sound like a bad place to get stuck, digi pres systems should be recordkeeping systems