For those that know me, I have been writing Golang for just over a year now. Introduced to it by Richard Lehane in October 2014 it has piqued my interest in my current work in digital preservation. It has the following benefits for that field:
- Compiles programs to a single executable file so it is easy to distribute.
- Cross-platform compilation is a feature of the language, so it is easy to distribute across many platforms.
- Unit testing is a feature of the language so supports better software development practices.
- Has many of the usability features of scripting languages like Python or Ruby but will perform better.
- Is not Java!
A key feature that we need in any language we work with is a persistent data store. There isn’t yet a de-facto standard for Golang.
My most useful project in Python relies heavily on SQLite and this is not (yet) implemented in Golang.
To use ‘C’ bindings to the Go language is a problem for code maintenance, and cross-compilation over time. Realistically, to benefit software sustainability, and digital preservation, we need to stick to native Golang alternatives.
Enter, BoltDB! But more importantly, for high-level abstraction for interacting with databases like BoltDB, Key Value Access Language (KVAL).
Bolt is a ‘Pure Golang key/value database’. At time of writing it is used by 669 Golang projects. It was recommended to me to have a look at this and it stands out to me as being a good compromise for persistent storage of data. It also looks like it might become one of those ‘de-facto’ standards.
Some of its features:
- A database object is a single binary file like SQLite.
- The project seems less complex than SQL or NoSQL options.
- Is a native Golang project.
The project has a simple API for getting and setting values only. It describes itself as being for use at a low-level of abstraction.
When I started working with BoltDB to see if it would match my needs I decided that it may, perhaps, be too low-level. Every transaction needed to be programmed and checked in different ways. The logical result for most it seems would be to create boilerplate libraries for themselves. These boilerplate libraries would all do the same thing:
- Open/deferred closure of a database object
- Validate and write data you want to write
- Get and validate data you want to retrieve
And if you’re not careful implementing this functionality, you only reduce the amount of programming complexity by a certain amount, and there may still be a lot of work for other callers of your new boilerplate code to do.
I thought about what I needed to do with BoltDB, and how I could express this in a more simplistic way. I also thought about what I liked about how I might work with SQLite in other languages.
In Python for example, most of the interactions with SQLite can be done on a pointer to your database (cursor) by running an execute function:
cursor.execute("SELECT * FROM TABLE WHERE X = Y")
This makes it easy to work with SQLite, and any SQL database for that matter, and so I decided to try and create a language specification for accessing Key Value Stores.
The language specification is here: https://github.com/kval-access-language/kval/blob/master/README.md
- It uses the concept of arbitrary numbers of ‘buckets’ from Bolt to store key value pairs
- A bucket may also be a key
- There are four read/write keywords INS, GET, LIS, REN, DEL
- There are a handful of operators to describe the relationship between buckets and key value pairs: >>, >>>>, ::, =>
An example INS (Insert) to create an arbitrary number of buckets, and a single key, value is as follows:
INS bucket one >> bucket two >>>> key name :: value
And to retrieve that:
GET bucket one >> bucket two >>>> key name
If you have lost track of what you called your key, read-all the bucket contents:
GET bucket one >> bucket two
And that’s it. Simply call Query, and keep working with your data. This small demo of the Bolt binding can be compiled as a starting point for your own work.
Other language features to try can be seen described in the specification above.
I discovered that there are three components to any binding.
- Token scanner: This breaks a query string into valid component parts
- Parser: Allows us to validate the query string and format it in a machine readable manner
- Binding: Once 1. and 2. are written, a binding will take the result output by the Parser and implement the language’s capabilities according to the library’s capabilities.
For one and two, I came across a tutorial by one of the BoltDB implementers Ben B Johnson: https://blog.gopheracademy.com/advent-2014/parsers-lexers/
This tutorial provided the bones to work from and was the best resource I could have wished for to implement this work.
The KVAL-BoltDB bindings I created on top of 1. and 2. can be found here: https://github.com/kval-access-language/kval-boltdb
Creating each new capability simply required systematically walking through the KVAL language and then implementing it using Bolt’s own features. The basic skeleton of this work was implemented in a couple of weeks.
I identified unit testing as the most important feature of the binding; this took somewhat longer. For users to have confidence in this work it would mean testing every possible scenario permitted by the language specification, and more.
The tests are still quite unwieldy as I learn Go, and better unit testing as I work. They are a good place to look for an understanding of the different command variants and library capabilities.
Some things I really wanted to make sure worked as well as possible included the storage of big strings e.g. blog text, and Unicode strings, and also the storage of Base64 encoded binary information, blobs.
Take a look!
Godoc is one of the coolest features of Golang and in releasing this work as a ‘Package’ (a library that other Go users can import and use), the capabilities are documented online for everyone to access via this resource: https://godoc.org/github.com/kval-access-language/kval-boltdb
Getting the documentation right has been a rewarding part of the development and I’m keen to improve it further as users discover the language.
It’s also a great place to look at what else you can do with this work.
This work is inspired by the axiom, “everything should be made as simple as possible, but not simpler”.
It is hoped that for Golang/BoltDB beginner users and intermediate users, at least, there will be benefit to using this library. I’ll be using it in my own production code soon.
It is also an ambition that folks look at the KVAL specification itself and consider its application as a bona-fide standard providing simple, fluid access to key/value like structures.
As a working draft there is lots of room for discussion about what is presented in this blog and in these various implementations.
I’d like to work on other bindings. NOMS is another Golang data store that may have even greater application in my other work. I’ll investigate if this is feasible.
All of the work surrounding this will be collected in the KVAL organisation’s repositories: https://github.com/kval-access-language
And finally, that demo link again: https://github.com/kval-access-language/kval-boltdb-demo