{"id":2186,"date":"2025-01-28T09:29:23","date_gmt":"2025-01-28T09:29:23","guid":{"rendered":"https:\/\/exponentialdecay.co.uk\/blog\/?p=2186"},"modified":"2025-12-01T16:59:51","modified_gmt":"2025-12-01T16:59:51","slug":"wikidata-mediawiki-wikidata-provenance-wikiprov","status":"publish","type":"post","link":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/","title":{"rendered":"wikidata + mediawiki = wikidata + provenance == wikiprov"},"content":{"rendered":"<p>Today I want to showcase a Wikidata proof of concept that I developed as part of my work integrating <a href=\"https:\/\/exponentialdecay.co.uk\/blog\/talk-using-a-custom-wikibase-with-siegfried\/\" target=\"_blank\" rel=\"noopener\">Siegfried and Wikidata<\/a>.<\/p>\n<p>That work is <a href=\"https:\/\/github.com\/ross-spencer\/wikiprov\" target=\"_blank\" rel=\"noopener\">wikiprov<\/a> a utility to augment Wikidata results in JSON with the Wikidata revision history.<\/p>\n<p>For siegfried it means that we can showcase the source of the results being returned by an identification without having to go directly back to Wikidata, this might mean more exposure for individuals contributing to Wikidata. We also provide access to a standard permalink where records contributing to a format identification are fixed at their last edit. Because Wikidata is more mutable than a resource like PRONOM this gives us the best chance of understanding differences in results if we are comparing siegfried+Wikidata results side-by-side.<\/p>\n<p>I am interested to hear your thoughts on the results of the work. Lets go into more detail below.<\/p>\n<p><!--more--><\/p>\n<h2>A (go)lang and winding road&#8230;<\/h2>\n<p>A lot of the work I do flies under the radar, but especially in this instance where my work on two tools spargo and then wikiprov were specifically about supporting my work on <a href=\"https:\/\/github.com\/richardlehane\/siegfried\" target=\"_blank\" rel=\"noopener\">siegfried<\/a>, a tool for format identification developed by Richard Lehane.<\/p>\n<p>We&#8217;ve gone into the work before, but Yale University Library asked whether Richard or myself could add a Wikidata identifier to the tool. We could, and so we did!<\/p>\n<p>Because siegfried uses no external dependencies, relying wholly on the Golang standard library whether in siegfried itself or the dependencies created by Richard, I didn&#8217;t want to be the first to add anything external to the tool that couldn&#8217;t be directly maintained and so I created two SPARQL packages spargo and wikiprov. While both libraries sit in my repositories, they only use standard library features and they can always be handed to Richard in the future.<\/p>\n<p><a href=\"https:\/\/github.com\/ross-spencer\/spargo\" target=\"_blank\" rel=\"noopener\">spargo<\/a> and <a href=\"https:\/\/github.com\/ross-spencer\/wikiprov\" target=\"_blank\" rel=\"noopener\">wikiprov<\/a> are both packages created for querying SPARQL endpoints in Golang. spargo is generic library that can potentially be adopted in most SPARQL cases. Wikiprov is the title of this blog and adds functionality specific to Wikidata, specifically MediaWiki+Wikidata, and I hope is far more interesting.<\/p>\n<h2>wikidata + provenance<\/h2>\n<p>We know that there is a complex history of edits that go into making resources like Wikipedia. In fact, developers have created ways to visualize or even listen to these edits, such as this by \u00a0<a href=\"http:\/\/github.com\/slaporte\" target=\"_blank\" rel=\"noopener\">Stephen LaPorte<\/a>\u00a0and\u00a0<a href=\"http:\/\/github.com\/mahmoud\" target=\"_blank\" rel=\"noopener\">Mahmoud Hashemi<\/a>.<\/p>\n<ul>\n<li><a href=\"http:\/\/listen.hatnote.com\/\" target=\"_blank\" rel=\"noopener\">Listen to WIkipedia<\/a>.<\/li>\n<\/ul>\n<p>It is no different with Wikidata.<\/p>\n<p>Wikidata isn&#8217;t that much different to Wikipedia. In fact, both sit on a technology called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Special:Version\" target=\"_blank\" rel=\"noopener\">MediaWiki<\/a>. Wikidata extends Mediawiki to create a graph of the underlying data as <a href=\"https:\/\/www.w3.org\/wiki\/LinkedData\" target=\"_blank\" rel=\"noopener\">linked open data<\/a>, this extension is also called Wikibase and gives users the ability to stand up their own Wikidata like service with their own knowledge graph.<\/p>\n<p>The linked open data in Wikidata is easy to access via its <a href=\"https:\/\/w.wiki\/CpoM\" target=\"_blank\" rel=\"noopener\">query service<\/a> (Wikidata Query Service, or WDQS). It&#8217;s history is less easy to access via query. Where <a href=\"https:\/\/en.wikipedia.org\/wiki\/Semantic_triple\" target=\"_blank\" rel=\"noopener\">triples<\/a> are used to represent data as it is now, as a graph, a concept called <a href=\"https:\/\/patterns.dataincubator.org\/book\/named-graphs.html\" target=\"_blank\" rel=\"noopener\">named graphs<\/a> exist in other linked open data models that promotes detailed provenance and versioning by extending linked data statements over a second dimension (a separate graph). The named graph can be queried like the original graph, but it provides entirely meta information allowing us to understand the source of the data.<\/p>\n<p>For different reasons Wikidata doesn&#8217;t offer this functionality. Although <a href=\"https:\/\/aidanhogan.com\/docs\/reification-wikidata-rdf-sparql.pdf\" target=\"_blank\" rel=\"noopener\">clever folks<\/a> are looking at it, I wanted a practical approach that had meaning in the context of siegfried+Wikidata, and so I found a different way around the problem.<\/p>\n<h3>wikidata as a snapshot<\/h3>\n<p>Using a named graph suggests to me some sort of dynamic querying, i.e. I am exploring a graph, and I want to retrieve different properties about that graph as I query. I might be querying live, or asking to update a query when I access another web resource, or something like that.<\/p>\n<p>siegfried, like DROID needs to access signature definitions that enable it to identify file formats. For siegfried, we really want to download a set of definitions once and then continue to access those as we identify our collections. As Wikidata is updated, another version of those signatures may be downloaded with new definitions, and we can use those &#8211; just like DROID uses PRONOM which has <a href=\"https:\/\/www.nationalarchives.gov.uk\/aboutapps\/pronom\/droid-signature-files.htm\" target=\"_blank\" rel=\"noopener\">119 versions<\/a> of its own signature definitions, we&#8217;re doing something less dynamic with Wikidata and something more analogous to taking a snapshot in time, a version, albeit a slightly more granular one depending on how often this &#8216;snapshot&#8217; is taken.<\/p>\n<p>This frees us.<\/p>\n<p>We can take this snapshot out of the endpoint, or the WDQS, and dissect the contents of the file as data, and for the purposes of this blog, I think of it as a document &#8212; a <a href=\"https:\/\/en.wikipedia.org\/wiki\/JSON#:~:text=JSON%20(JavaScript%20Object%20Notation%2C%20pronounced,(or%20other%20serializable%20values).\" target=\"_blank\" rel=\"noopener\">JSON<\/a> document &#8212; and we can create our own rules about how to parse and understand that document.<\/p>\n<h3>Mediawiki, notre je ne sais quoi<\/h3>\n<p>I have mentioned Wikidata extends Mediawiki. The Wikidata extension provides the ability to create and query triples as linked open data.<\/p>\n<p>Mediawiki on the other hand provides the underlying capability to create articles, and alongside those articles, record edit history.<\/p>\n<p>Take for example my own file format <a href=\"https:\/\/www.wikidata.org\/wiki\/Q105858419\" target=\"_blank\" rel=\"noopener\">Eyeglass<\/a> (<code>*.eygl<\/code>), when a user clicks on view history, they can see the history here: <a href=\"https:\/\/www.wikidata.org\/w\/index.php?title=Q105858419&amp;action=history\" target=\"_blank\" rel=\"noopener\">https:\/\/www.wikidata.org\/w\/index.php?title=Q105858419&amp;action=history<\/a><\/p>\n<p>It is possible to access this history programatically, piped into JQ:<\/p>\n<pre>curl \\\r\n-s \"https:\/\/www.wikidata.org\/w\/api.php?action=query&amp;format=json&amp;prop=revisions&amp;titles=Q105858419&amp;rvlimit=5&amp;rvprop=ids|user|comment|timestamp|sha1\" \\\r\n| jq<\/pre>\n<p>The results:<\/p>\n<pre>{\r\n\u00a0 \"continue\": {\r\n\u00a0 \u00a0 \"rvcontinue\": \"20210412143356|1400469290\",\r\n\u00a0 \u00a0 \"continue\": \"||\"\r\n\u00a0 },\r\n\u00a0 \"query\": {\r\n\u00a0 \u00a0 \"pages\": {\r\n\u00a0 \u00a0 \u00a0 \"101213417\": {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"pageid\": 101213417,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"ns\": 0,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"title\": \"Q105858419\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \"revisions\": [\r\n\u00a0 \u00a0 \u00a0 \u00a0 {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"revid\": 1866845623,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"parentid\": 1757596329,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"user\": \"Renamerr\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"timestamp\": \"2023-04-02T16:29:43Z\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"sha1\": \"3f12a3371bbd490bb74dd4402283e8a897411e91\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"comment\": \"\/* wbsetdescription-add:1|uk *\/ \u0444\u043e\u0440\u043c\u0430\u0442 \u0444\u0430\u0439\u043b\u0443, [[:toollabs:quickstatements\/#\/batch\/151018|batch #151018]]\"\r\n\u00a0 \u00a0 \u00a0 \u00a0 },\r\n\u00a0 \u00a0 \u00a0 \u00a0 {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"revid\": 1757596329,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"parentid\": 1533276464,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"user\": \"A particle for world to form\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"timestamp\": \"2022-10-25T04:53:04Z\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"sha1\": \"31182afa67fd562b5c138ca5e0a41f865c643f3a\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"comment\": \"\/* wbeditentity-update-languages-short:0||ru *\/ \u0444\u043e\u0440\u043c\u0430\u0442 \u0444\u0430\u0439\u043b\u0430\"\r\n\u00a0 \u00a0 \u00a0 \u00a0 },\r\n\u00a0 \u00a0 \u00a0 \u00a0 {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"revid\": 1533276464,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"parentid\": 1533275566,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"user\": \"Beet keeper\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"timestamp\": \"2021-11-24T13:40:17Z\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"sha1\": \"f8bc9eec0e7d14910d11784b13ea0e1f464d5735\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"comment\": \"\/* wbsetclaim-create:2||1 *\/ [[Property:P973]]: https:\/\/exponentialdecay.co.uk\/blog\/genesis-of-a-file-format\/\"\r\n\u00a0 \u00a0 \u00a0 \u00a0 },\r\n\u00a0 \u00a0 \u00a0 \u00a0 {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"revid\": 1533275566,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"parentid\": 1423307021,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"user\": \"Beet keeper\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"timestamp\": \"2021-11-24T13:37:33Z\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"sha1\": \"e02a27d006766652038c91c758c148fe6534d875\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"comment\": \"\/* wbmergeitems-from:0||Q28600778 *\/\"\r\n\u00a0 \u00a0 \u00a0 \u00a0 },\r\n\u00a0 \u00a0 \u00a0 \u00a0 {\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"revid\": 1423307021,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"parentid\": 1400469290,\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"user\": \"Edoderoobot\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"timestamp\": \"2021-05-18T08:14:57Z\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"sha1\": \"45521c5e1bd6bc3a8dfc70e7e0506d946bb48df7\",\r\n\u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \"comment\": \"\/* wbeditentity-update-languages-short:0||nl *\/ nl-description, [[User:Edoderoobot\/Set-nl-description|python code]] - fileformat\"\r\n\u00a0 \u00a0 \u00a0 \u00a0 }\r\n\u00a0 \u00a0 \u00a0 \u00a0 ]\r\n\u00a0 \u00a0 \u00a0 }\r\n\u00a0 \u00a0 }\r\n\u00a0 }\r\n}<\/pre>\n<p>We can see users who last edited this record, and we have some indication of what those edits were.<\/p>\n<p>Given programmatic access to this data, as well as programmatic access to the triple data via the query service, we can begin to see opportunities to combine the two data sets.<\/p>\n<h3>wikiprov file format<\/h3>\n<p>I describe this in more detail in the wikiprov <a href=\"https:\/\/github.com\/ross-spencer\/wikiprov\/blob\/main\/README.md\" target=\"_blank\" rel=\"noopener\">README<\/a>.<\/p>\n<p>The standard response from any SPARQL endpoint looks something like as follows in JSON:<\/p>\n<pre>{\r\n  <span class=\"pl-ent\">\"head\"<\/span>: {},\r\n  <span class=\"pl-ent\">\"results\"<\/span>: {\r\n    <span class=\"pl-ent\">\"bindings\"<\/span>: [{}]\r\n  }\r\n}<\/pre>\n<p>The <code>head<\/code> describes the different parameters requested in the query. The <code>results<\/code> the different values of the triples that are returned.<\/p>\n<p>I assumed most will access these values using the keys <code>head<\/code> and <code>results<\/code> and with no desire to break compatibility, I felt instead of adding provenance somewhere within the existing results format, I could safely add another key to this structure, <code>provenance<\/code> creating:<\/p>\n<pre>{\r\n  <span class=\"pl-ent\">\"head\"<\/span>: {},\r\n  <span class=\"pl-ent\">\"results\"<\/span>: {\r\n    <span class=\"pl-ent\">\"bindings\"<\/span>: [{}]\r\n  },\r\n  <span class=\"pl-ent\">\"provenance\"<\/span>: {}\r\n}<\/pre>\n<p>Where <code>provenance<\/code> could now hold revision history for each of the objects returned in any given SPARQL query.<\/p>\n<p>In siegfried, the bindings might describe format <code>Q105858419<\/code> and in the provenance array, we end up with a snippet of revision history for the format as follows:<\/p>\n<pre>{\r\n  \"Title\": \"Q105858419\",\r\n  \"Entity\": \"http:\/\/wikidata.org\/entity\/Q105858419\",\r\n  \"Revision\": 1866845623,\r\n  \"Modified\": \"2023-04-02T16:29:43Z\",\r\n  \"Permalink\": \"https:\/\/www.wikidata.org\/w\/index.php?oldid=1866845623&amp;title=Q105858419\",\r\n  \"History\": [\r\n    \"2023-04-02T16:29:43Z (oldid: 1866845623): 'Renamerr' edited: '\/* wbsetdescription-add:1|uk *\/ \u0444\u043e\u0440\u043c\u0430\u0442 \u0444\u0430\u0439\u043b\u0443, [[:toollabs:quickstatements\/#\/batch\/151018|batch #151018]]'\",\r\n    \"2022-10-25T04:53:04Z (oldid: 1757596329): 'A particle for world to form' edited: '\/* wbeditentity-update-languages-short:0||ru *\/ \u0444\u043e\u0440\u043c\u0430\u0442 \u0444\u0430\u0439\u043b\u0430'\",\r\n    \"2021-11-24T13:40:17Z (oldid: 1533276464): 'Beet keeper' edited: '\/* wbsetclaim-create:2||1 *\/ [[Property:P973]]: https:\/\/exponentialdecay.co.uk\/blog\/genesis-of-a-file-format\/'\",\r\n    \"2021-11-24T13:37:33Z (oldid: 1533275566): 'Beet keeper' edited: '\/* wbmergeitems-from:0||Q28600778 *\/'\",\r\n    \"2021-05-18T08:14:57Z (oldid: 1423307021): 'Edoderoobot' edited: '\/* wbeditentity-update-languages-short:0||nl *\/ nl-description, [[User:Edoderoobot\/Set-nl-description|python code]] - fileformat'\"\r\n  ]\r\n}<\/pre>\n<p>And this will look pretty consistent for each file format returned by our <a href=\"https:\/\/github.com\/richardlehane\/siegfried\/blob\/063951c5773ce164f5bd1dadd504bdc8c22d8946\/pkg\/config\/internal\/wikidatasparql\/sparql.go#L37-L60\" target=\"_blank\" rel=\"noopener\">in-built query<\/a>.<\/p>\n<p>Each time we get a result from siegfried we also get access to a permalink to the last update of the record providing the source data for the signature file. In the case of <code>eygl<\/code> today:<\/p>\n<pre>filename : 'example.eygl'\r\nfilesize : 14\r\nmodified : 2025-01-24T13:59:31+01:00\r\nerrors : \r\nmatches :\r\n- ns : 'wikidata'\r\nid : 'Q105858419'\r\nformat : 'Eyeglass format'\r\nURI : 'http:\/\/www.wikidata.org\/entity\/Q105858419'\r\npermalink : 'https:\/\/www.wikidata.org\/w\/index.php?oldid=1866845623&amp;title=Q105858419'\r\nmime : 'application\/octet-stream'\r\nbasis : 'extension match eygl; byte match at 0, 14 (Wikidata reference is empty)'\r\nwarning :<\/pre>\n<h2>wikiprov the package and command<\/h2>\n<p>While wikiprov was created for siegfried it can be used for any Wikidata (or Wikibase) query.<\/p>\n<h3>wikiprov package<\/h3>\n<p>The package is <a href=\"https:\/\/pkg.go.dev\/github.com\/ross-spencer\/wikiprov\/pkg\/wikiprov\" target=\"_blank\" rel=\"noopener\">documented<\/a> using Go best practices.<\/p>\n<p><a href=\"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/01\/wikiprov-godoc.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-2171\" src=\"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/01\/wikiprov-godoc.png\" alt=\"Screenshot of the Wikiprov documentation within Godoc\" width=\"1844\" height=\"927\" srcset=\"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/01\/wikiprov-godoc.png 1844w, https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/01\/wikiprov-godoc-500x251.png 500w, https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/01\/wikiprov-godoc-1024x515.png 1024w, https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/01\/wikiprov-godoc-768x386.png 768w, https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/01\/wikiprov-godoc-1536x772.png 1536w\" sizes=\"auto, (max-width: 1844px) 100vw, 1844px\" \/><\/a><\/p>\n<p>A runnable example might look as follows:<\/p>\n<pre><span class=\"pl-k\">package<\/span> main\r\n\r\n<span class=\"pl-k\">import<\/span> (\r\n\t<span class=\"pl-s\">\"fmt\"<\/span>\r\n\r\n\t<span class=\"pl-s\">\"github.com\/ross-spencer\/wikiprov\/pkg\/wikiprov\"<\/span>\r\n)\r\n\r\n<span class=\"pl-k\">func<\/span> <span class=\"pl-s1\">main<\/span>() {\r\n\t<span class=\"pl-k\">var<\/span> <span class=\"pl-s1\">qid<\/span> <span class=\"pl-c1\">=<\/span> <span class=\"pl-s\">\"Q105858419\"<\/span>\r\n\t<span class=\"pl-s1\">res<\/span>, <span class=\"pl-s1\">err<\/span> <span class=\"pl-c1\">:=<\/span> <span class=\"pl-s1\">wikiprov<\/span>.<span class=\"pl-c1\">GetWikidataProvenance<\/span>(<span class=\"pl-s1\">qid<\/span>, <span class=\"pl-c1\">10<\/span>)\r\n\t<span class=\"pl-k\">if<\/span> <span class=\"pl-s1\">err<\/span> <span class=\"pl-c1\">!=<\/span> <span class=\"pl-c1\">nil<\/span> {\r\n\t\t<span class=\"pl-s1\">panic<\/span>(<span class=\"pl-s1\">err<\/span>)\r\n\t}\r\n\t<span class=\"pl-s1\">fmt<\/span>.<span class=\"pl-c1\">Println<\/span>(<span class=\"pl-s1\">res<\/span>)\r\n}<\/pre>\n<h3>Command line<\/h3>\n<p>I had a little fun creating the command line apps for this.<\/p>\n<p>There are two apps including a provenance enhanced version of <code>spargo<\/code> discussed below.<\/p>\n<p>I created a new text-based executable format for SPARQL queries. The format utilizes a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Shebang_(Unix)#:~:text=In%20computing%2C%20a%20shebang%20is,the%20beginning%20of%20a%20script.\" target=\"_blank\" rel=\"noopener\">shebang<\/a> like in Unix to allow the format to be interpreted when the <code>spargo<\/code> executable is in a suitable place on the path.<\/p>\n<p>An example of a file that can be used to query Wikidata and return provenance:<\/p>\n<pre>#!\/usr\/bin\/spargo\r\n\r\nENDPOINT=https:\/\/query.wikidata.org\/sparql\r\nWIKIBASEURL=https:\/\/www.wikidata.org\/\r\nHISTORY=3\r\n\r\n# subject, predicate, or object can all be used here. I have elected for\r\n# ?subject as it outputs more information.\r\nSUBJECTPARAM=?subject\r\n\r\n# Describe JPEG2000 in Wikidata database.\r\ndescribe wd:Q931783<\/pre>\n<p>A simplified version can be called without provenance by removing the <code>WIKIBASEURL<\/code> and <code>HISTORY<\/code> fields.<\/p>\n<p>Another simple example:<\/p>\n<pre>#!\/usr\/bin\/spargo\r\n\r\nENDPOINT=https:\/\/query.wikidata.org\/sparql\r\nWIKIBASEURL=https:\/\/www.wikidata.org\/\r\nHISTORY=5\r\nSUBJECTPARAM=?item\r\n\r\n# Default query example on Wikidata:\r\nSELECT ?item ?itemLabel\r\nWHERE\r\n{\r\n?item wdt:P31 wd:Q146.\r\nSERVICE wikibase:label { bd:serviceParam wikibase:language \"[AUTO_LANGUAGE],en\". }\r\n}\r\nlimit 1<\/pre>\n<p>With <code>\/usr\/bin\/spargo<\/code> correctly setup these can be run with <code>.\/path\/to\/file.sparql<\/code> and results will be output to the command line and can be parsed with <code>jq<\/code>.<\/p>\n<p>The file format is <a href=\"https:\/\/github.com\/ross-spencer\/wikiprov?tab=readme-ov-file#with-provenance\" target=\"_blank\" rel=\"noopener\">documented<\/a><a href=\"https:\/\/github.com\/ross-spencer\/wikiprov\/tree\/main\/cmd\/spargo\" target=\"_blank\" rel=\"noopener\"> and there are examples in the wikiprov <code>spargo<\/code><\/a> directory.<\/p>\n<p>An alternative to adding executable permissions to your queries, you can also pipe the query into the app, so, with <code>spargo<\/code> on the <code>$PATH<\/code> you can do something like:<\/p>\n<pre>cat \/path\/to\/my\/query.sparql | spargo<\/pre>\n<p>And the results will be output to the terminal.<\/p>\n<h4>wikiprov command line<\/h4>\n<p>The <code>wikiprov<\/code> command line utility is a simple utility for returning the latest information about a given Wikidata ID (QID). It can be studied as a brief example of how to call the wikiprov library.<\/p>\n<p>Example CLI options:<\/p>\n<pre>wikiprov: return info about a QID from Wikidata\r\nusage: wikiprov &lt;QID e.g. Q27229608&gt; {options} \r\n                                     OPTIONAL: [-history] ...\r\n                                     OPTIONAL: [-version]\r\n\r\noutput: [JSON] {wikidataProvenace}\r\noutput: [STRING] 'wikiprov\/0.0.0 (https:\/\/github.com\/ross-spencer\/wikiprov; all.along.the.watchtower+github@gmail.com) ...'\r\n\r\nUsage of .\/wikiprov:\r\n  -demo\r\n    Run the tool with a demo value and all provenance\r\n  -history int\r\n    length of history to return (default 10)\r\n  -qid string\r\n    QID to look up provenance for\r\n  -version\r\n    Return version<\/pre>\n<h4>Command line releases<\/h4>\n<p>The command line tools can be found in GitHub under <a href=\"https:\/\/github.com\/ross-spencer\/wikiprov\/releases\/\" target=\"_blank\" rel=\"noopener\">releases<\/a>.<\/p>\n<hr \/>\n<h2>Inspecting siegfried<\/h2>\n<p>Richard has documented the <code>$HOME<\/code> folder for siegfried <a href=\"https:\/\/github.com\/richardlehane\/siegfried\/wiki\/Building-a-signature-file-with-ROY#home-directory\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p>If you have built a Wikidata identifier, or you have run <code>sf -update wikidata<\/code> or <code>sf -update deluxe<\/code> you should be able to inspect the wikidata results using tools like jq or your own libraries.<\/p>\n<p>My wikidata definitions, for example are at: <code>\/home\/ross-spencer\/.local\/share\/siegfried\/wikidata<\/code>.<\/p>\n<div>\n<hr \/>\n<h2>Improvements to the format<\/h2>\n<\/div>\n<div><\/div>\n<div>Currently, the snippet of revision history returned from MediaWiki is not formatted in any way. Were there interest, it may be interesting to write a prettier routine for this data so that it can be understood more easily.<\/div>\n<div><\/div>\n<hr \/>\n<h2>Trade-offs with complexity<\/h2>\n<p>Ethan Gates first <a href=\"https:\/\/github.com\/richardlehane\/siegfried\/issues\/183\" target=\"_blank\" rel=\"noopener\">reported an issue<\/a> in 2022 and Tyler has also been suffering with the same issue until now.<\/p>\n<p>Because we&#8217;re not just going out to a single endpoint, we&#8217;re going out to two, we have two potential sources of failure. In-fact, my first response to Ethan assumed this was going to be a Wikidata query &#8212; I worked on this premise for a good while, even going so far as to start writing a mirror service to make results available from different sources than the WDQS. None of my ideas worked.<\/p>\n<p>It turned out the problem was the revision history from MediaWiki. Essentially, both services return the same error when a process takes too long, e.g. requesting 8000+ records, but my own experience had taught me Wikidata was more likely to take too long. It may have been the case once, but now I was seeing the same with MediaWiki.<\/p>\n<p>My analysis in January 2025 taught me to be more kind to the MediaWiki API. Their instructions include a directive to use a <code>Retry-After<\/code> value in their HTTP response when their server was overloaded or busy. I ignored this the first time around but I have implemented this now. I have also made sure that Siegfried can use a <code>-noprov<\/code> flag when downloading Wikidata so that testing is never impacted by the inability to download revision history. We still get revision history by default, and this should pretty much always work now, but still, it&#8217;s good to have both options out there.<\/p>\n<h3>A note of thanks<\/h3>\n<p>I also want to thank both Ethan and Tyler for reporting the issue, and giving me the excuse today to write up this part of the siegfried+Wikidata process in more detail for those who may be interested. While I wish I had realized the problem sooner, I am grateful to have been able to make my libraries and tooling more robust as a result.<\/p>\n<hr \/>\n<h2>Bei der Buche<\/h2>\n<p>Today&#8217;s image is one I photographed in 2021 in Warberg near Stuttgart.<\/p>\n<p>The image was selected to conjure the image of a trail (provenance trail) it turns out it also represents a false impression &#8212; where I labored under the idea Stuttgart had what I thought were ancient ruins that folks could still visit, I found out that instead it was this vast, architectural art installation that was built as part of the International Horticultural Exhibition in 1993.<\/p>\n<p>The image of <a href=\"https:\/\/de.wikipedia.org\/wiki\/Bei_der_Buche\" target=\"_blank\" rel=\"noopener\">Bei der Buche<\/a> translates roughly to &#8220;At the Beech&#8221; and is an installation incorporating a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Beech\" target=\"_blank\" rel=\"noopener\">Beech Tree<\/a> (the mother among the trees) by the architect and photographer <a href=\"https:\/\/de.wikipedia.org\/wiki\/Karina_Raeck\" target=\"_blank\" rel=\"noopener\">Karina Raeck<\/a>.<\/p>\n<p>More about the Beech Tree: <a href=\"https:\/\/www.europeanbeechforests.org\/world-heritage-beech-forests\/germany\" target=\"_blank\" rel=\"noopener\">https:\/\/www.europeanbeechforests.org\/world-heritage-beech-forests\/germany<\/a>.<\/p>\n<div class=\"pvc_clear\"><\/div>\n<p id=\"pvc_stats_2186\" class=\"pvc_stats total_only  \" data-element-id=\"2186\" style=\"\"><i class=\"pvc-stats-icon small\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> <img loading=\"lazy\" decoding=\"async\" width=\"16\" height=\"16\" alt=\"Loading\" src=\"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/plugins\/page-views-count\/ajax-loader-2x.gif\" border=0 \/><\/p>\n<div class=\"pvc_clear\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Today I want to showcase a Wikidata proof of concept that I developed as part of my work integrating <a href=\"https:\/\/exponentialdecay.co.uk\/blog\/talk-using-a-custom-wikibase-with-siegfried\/\" target=\"_blank\" rel=\"noopener\">Siegfried and Wikidata<\/a>.<\/p>\n<p>That work is <a href=\"https:\/\/github.com\/ross-spencer\/wikiprov\" target=\"_blank\" rel=\"noopener\">wikiprov<\/a> a utility to augment Wikidata results in JSON with the Wikidata revision history.<\/p>\n<p>For siegfried it means that we can showcase the source of the results being returned by an identification without having to go directly back to Wikidata, this might mean more exposure for individuals contributing to Wikidata. We also provide access to a standard permalink where records contributing to a format identification are fixed at their last edit. Because Wikidata is more mutable than a resource like PRONOM this gives us the best chance of understanding differences in results if we are comparing siegfried+Wikidata results side-by-side.<\/p>\n<p>I am interested to hear your thoughts on the results of the work. Lets go into more detail below.<\/p>\n<div class=\"link-more\"><a href=\"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &ldquo;wikidata + mediawiki = wikidata + provenance == wikiprov&rdquo;<\/span>&hellip;<\/a><\/div>\n<div class=\"pvc_clear\"><\/div>\n<p id=\"pvc_stats_2186\" class=\"pvc_stats total_only  \" data-element-id=\"2186\" style=\"\"><i class=\"pvc-stats-icon small\" aria-hidden=\"true\"><svg aria-hidden=\"true\" focusable=\"false\" data-prefix=\"far\" data-icon=\"chart-bar\" role=\"img\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" viewBox=\"0 0 512 512\" class=\"svg-inline--fa fa-chart-bar fa-w-16 fa-2x\"><path fill=\"currentColor\" d=\"M396.8 352h22.4c6.4 0 12.8-6.4 12.8-12.8V108.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v230.4c0 6.4 6.4 12.8 12.8 12.8zm-192 0h22.4c6.4 0 12.8-6.4 12.8-12.8V140.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v198.4c0 6.4 6.4 12.8 12.8 12.8zm96 0h22.4c6.4 0 12.8-6.4 12.8-12.8V204.8c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v134.4c0 6.4 6.4 12.8 12.8 12.8zM496 400H48V80c0-8.84-7.16-16-16-16H16C7.16 64 0 71.16 0 80v336c0 17.67 14.33 32 32 32h464c8.84 0 16-7.16 16-16v-16c0-8.84-7.16-16-16-16zm-387.2-48h22.4c6.4 0 12.8-6.4 12.8-12.8v-70.4c0-6.4-6.4-12.8-12.8-12.8h-22.4c-6.4 0-12.8 6.4-12.8 12.8v70.4c0 6.4 6.4 12.8 12.8 12.8z\" class=\"\"><\/path><\/svg><\/i> <img loading=\"lazy\" decoding=\"async\" width=\"16\" height=\"16\" alt=\"Loading\" src=\"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/plugins\/page-views-count\/ajax-loader-2x.gif\" border=0 \/><\/p>\n<div class=\"pvc_clear\"><\/div>\n","protected":false},"author":1,"featured_media":2157,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"activitypub_content_warning":"","activitypub_content_visibility":"","activitypub_max_image_attachments":3,"activitypub_interaction_policy_quote":"anyone","activitypub_status":"federated","footnotes":""},"categories":[49,114,75],"tags":[428,61,124,21,115,76,283,121,42,285,282,281,185,35,188,119],"class_list":["post-2186","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-digital-humanities","category-digital-literacy","category-golang","tag-428","tag-code","tag-creative-commons","tag-data","tag-digital-literacy","tag-golang","tag-mediawiki","tag-open-data","tag-open-source","tag-provenance","tag-reification","tag-reify","tag-siegfried","tag-sparql","tag-wikibase","tag-wikidata","entry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>wikidata + mediawiki = wikidata + provenance == wikiprov - ross spencer :: exponentialdecay.digipres :: blog<\/title>\n<meta name=\"description\" content=\"Adding provenance to Wikidata output by combining the outputs of Wikidata and Mediawiki and looking at the tools we have created to do this.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"wikidata + mediawiki = wikidata + provenance == wikiprov - ross spencer :: exponentialdecay.digipres :: blog\" \/>\n<meta property=\"og:description\" content=\"Adding provenance to Wikidata output by combining the outputs of Wikidata and Mediawiki and looking at the tools we have created to do this.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/\" \/>\n<meta property=\"og:site_name\" content=\"ross spencer :: exponentialdecay.digipres :: blog\" \/>\n<meta property=\"article:published_time\" content=\"2025-01-28T09:29:23+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-01T16:59:51+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/01\/wartberg-stuttgart-scaled.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2560\" \/>\n\t<meta property=\"og:image:height\" content=\"1440\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Ross Spencer\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@beet_keeper\" \/>\n<meta name=\"twitter:site\" content=\"@beet_keeper\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Ross Spencer\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wikidata-mediawiki-wikidata-provenance-wikiprov\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wikidata-mediawiki-wikidata-provenance-wikiprov\\\/\"},\"author\":{\"name\":\"Ross Spencer\",\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/#\\\/schema\\\/person\\\/4cae0a954400f42b9c1b70c699837716\"},\"headline\":\"wikidata + mediawiki = wikidata + provenance == wikiprov\",\"datePublished\":\"2025-01-28T09:29:23+00:00\",\"dateModified\":\"2025-12-01T16:59:51+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wikidata-mediawiki-wikidata-provenance-wikiprov\\\/\"},\"wordCount\":1934,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/#\\\/schema\\\/person\\\/4cae0a954400f42b9c1b70c699837716\"},\"image\":{\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wikidata-mediawiki-wikidata-provenance-wikiprov\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/01\\\/wartberg-stuttgart-scaled.jpg\",\"keywords\":[\"151018\",\"Code\",\"Creative Commons\",\"Data\",\"digital literacy\",\"Golang\",\"mediawiki\",\"Open Data\",\"Open Source\",\"provenance\",\"reification\",\"reify\",\"siegfried\",\"SPARQL\",\"wikibase\",\"wikidata\"],\"articleSection\":[\"Digital Humanities\",\"Digital Literacy\",\"Golang\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wikidata-mediawiki-wikidata-provenance-wikiprov\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wikidata-mediawiki-wikidata-provenance-wikiprov\\\/\",\"url\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wikidata-mediawiki-wikidata-provenance-wikiprov\\\/\",\"name\":\"wikidata + mediawiki = wikidata + provenance == wikiprov - ross spencer :: exponentialdecay.digipres :: blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wikidata-mediawiki-wikidata-provenance-wikiprov\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wikidata-mediawiki-wikidata-provenance-wikiprov\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/01\\\/wartberg-stuttgart-scaled.jpg\",\"datePublished\":\"2025-01-28T09:29:23+00:00\",\"dateModified\":\"2025-12-01T16:59:51+00:00\",\"description\":\"Adding provenance to Wikidata output by combining the outputs of Wikidata and Mediawiki and looking at the tools we have created to do this.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wikidata-mediawiki-wikidata-provenance-wikiprov\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wikidata-mediawiki-wikidata-provenance-wikiprov\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wikidata-mediawiki-wikidata-provenance-wikiprov\\\/#primaryimage\",\"url\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/01\\\/wartberg-stuttgart-scaled.jpg\",\"contentUrl\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/01\\\/wartberg-stuttgart-scaled.jpg\",\"width\":2560,\"height\":1440,\"caption\":\"\\\"Bei der Buche\\\", a landscape architectural installation by landscape architect and photographer Karina Raeck. Created in 1993 in the Wartberg area north-east of Stuttgart.\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wikidata-mediawiki-wikidata-provenance-wikiprov\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"wikidata + mediawiki = wikidata + provenance == wikiprov\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/\",\"name\":\"ross spencer :: exponentialdecay.digipres :: blog\",\"description\":\"Digital preservation analyst, researcher, and software developer\",\"publisher\":{\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/#\\\/schema\\\/person\\\/4cae0a954400f42b9c1b70c699837716\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/#\\\/schema\\\/person\\\/4cae0a954400f42b9c1b70c699837716\",\"name\":\"Ross Spencer\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/06\\\/avatar-scaled.png\",\"url\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/06\\\/avatar-scaled.png\",\"contentUrl\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/06\\\/avatar-scaled.png\",\"width\":2560,\"height\":2560,\"caption\":\"Ross Spencer\"},\"logo\":{\"@id\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/wp-content\\\/uploads\\\/2025\\\/06\\\/avatar-scaled.png\"},\"description\":\"Digital preservation domain expert and full-stack software developer.\",\"sameAs\":[\"http:\\\/\\\/www.exponentialdecay.co.uk\\\/blog\",\"https:\\\/\\\/www.instagram.com\\\/b33tk33p3r\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/in\\\/ross-spencer-b6b9b758\\\/\",\"https:\\\/\\\/x.com\\\/beet_keeper\"],\"url\":\"https:\\\/\\\/exponentialdecay.co.uk\\\/blog\\\/author\\\/exponentialdecay\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"wikidata + mediawiki = wikidata + provenance == wikiprov - ross spencer :: exponentialdecay.digipres :: blog","description":"Adding provenance to Wikidata output by combining the outputs of Wikidata and Mediawiki and looking at the tools we have created to do this.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/","og_locale":"en_US","og_type":"article","og_title":"wikidata + mediawiki = wikidata + provenance == wikiprov - ross spencer :: exponentialdecay.digipres :: blog","og_description":"Adding provenance to Wikidata output by combining the outputs of Wikidata and Mediawiki and looking at the tools we have created to do this.","og_url":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/","og_site_name":"ross spencer :: exponentialdecay.digipres :: blog","article_published_time":"2025-01-28T09:29:23+00:00","article_modified_time":"2025-12-01T16:59:51+00:00","og_image":[{"width":2560,"height":1440,"url":"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/01\/wartberg-stuttgart-scaled.jpg","type":"image\/jpeg"}],"author":"Ross Spencer","twitter_card":"summary_large_image","twitter_creator":"@beet_keeper","twitter_site":"@beet_keeper","twitter_misc":{"Written by":"Ross Spencer","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/#article","isPartOf":{"@id":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/"},"author":{"name":"Ross Spencer","@id":"https:\/\/exponentialdecay.co.uk\/blog\/#\/schema\/person\/4cae0a954400f42b9c1b70c699837716"},"headline":"wikidata + mediawiki = wikidata + provenance == wikiprov","datePublished":"2025-01-28T09:29:23+00:00","dateModified":"2025-12-01T16:59:51+00:00","mainEntityOfPage":{"@id":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/"},"wordCount":1934,"commentCount":1,"publisher":{"@id":"https:\/\/exponentialdecay.co.uk\/blog\/#\/schema\/person\/4cae0a954400f42b9c1b70c699837716"},"image":{"@id":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/#primaryimage"},"thumbnailUrl":"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/01\/wartberg-stuttgart-scaled.jpg","keywords":["151018","Code","Creative Commons","Data","digital literacy","Golang","mediawiki","Open Data","Open Source","provenance","reification","reify","siegfried","SPARQL","wikibase","wikidata"],"articleSection":["Digital Humanities","Digital Literacy","Golang"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/","url":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/","name":"wikidata + mediawiki = wikidata + provenance == wikiprov - ross spencer :: exponentialdecay.digipres :: blog","isPartOf":{"@id":"https:\/\/exponentialdecay.co.uk\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/#primaryimage"},"image":{"@id":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/#primaryimage"},"thumbnailUrl":"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/01\/wartberg-stuttgart-scaled.jpg","datePublished":"2025-01-28T09:29:23+00:00","dateModified":"2025-12-01T16:59:51+00:00","description":"Adding provenance to Wikidata output by combining the outputs of Wikidata and Mediawiki and looking at the tools we have created to do this.","breadcrumb":{"@id":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/#primaryimage","url":"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/01\/wartberg-stuttgart-scaled.jpg","contentUrl":"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/01\/wartberg-stuttgart-scaled.jpg","width":2560,"height":1440,"caption":"\"Bei der Buche\", a landscape architectural installation by landscape architect and photographer Karina Raeck. Created in 1993 in the Wartberg area north-east of Stuttgart."},{"@type":"BreadcrumbList","@id":"https:\/\/exponentialdecay.co.uk\/blog\/wikidata-mediawiki-wikidata-provenance-wikiprov\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/exponentialdecay.co.uk\/blog\/"},{"@type":"ListItem","position":2,"name":"wikidata + mediawiki = wikidata + provenance == wikiprov"}]},{"@type":"WebSite","@id":"https:\/\/exponentialdecay.co.uk\/blog\/#website","url":"https:\/\/exponentialdecay.co.uk\/blog\/","name":"ross spencer :: exponentialdecay.digipres :: blog","description":"Digital preservation analyst, researcher, and software developer","publisher":{"@id":"https:\/\/exponentialdecay.co.uk\/blog\/#\/schema\/person\/4cae0a954400f42b9c1b70c699837716"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/exponentialdecay.co.uk\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/exponentialdecay.co.uk\/blog\/#\/schema\/person\/4cae0a954400f42b9c1b70c699837716","name":"Ross Spencer","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/06\/avatar-scaled.png","url":"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/06\/avatar-scaled.png","contentUrl":"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/06\/avatar-scaled.png","width":2560,"height":2560,"caption":"Ross Spencer"},"logo":{"@id":"https:\/\/exponentialdecay.co.uk\/blog\/wp-content\/uploads\/2025\/06\/avatar-scaled.png"},"description":"Digital preservation domain expert and full-stack software developer.","sameAs":["http:\/\/www.exponentialdecay.co.uk\/blog","https:\/\/www.instagram.com\/b33tk33p3r\/","https:\/\/www.linkedin.com\/in\/ross-spencer-b6b9b758\/","https:\/\/x.com\/beet_keeper"],"url":"https:\/\/exponentialdecay.co.uk\/blog\/author\/exponentialdecay\/"}]}},"views":2612,"_links":{"self":[{"href":"https:\/\/exponentialdecay.co.uk\/blog\/wp-json\/wp\/v2\/posts\/2186","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/exponentialdecay.co.uk\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/exponentialdecay.co.uk\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/exponentialdecay.co.uk\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/exponentialdecay.co.uk\/blog\/wp-json\/wp\/v2\/comments?post=2186"}],"version-history":[{"count":4,"href":"https:\/\/exponentialdecay.co.uk\/blog\/wp-json\/wp\/v2\/posts\/2186\/revisions"}],"predecessor-version":[{"id":2193,"href":"https:\/\/exponentialdecay.co.uk\/blog\/wp-json\/wp\/v2\/posts\/2186\/revisions\/2193"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/exponentialdecay.co.uk\/blog\/wp-json\/wp\/v2\/media\/2157"}],"wp:attachment":[{"href":"https:\/\/exponentialdecay.co.uk\/blog\/wp-json\/wp\/v2\/media?parent=2186"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/exponentialdecay.co.uk\/blog\/wp-json\/wp\/v2\/categories?post=2186"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/exponentialdecay.co.uk\/blog\/wp-json\/wp\/v2\/tags?post=2186"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}