NASA Wakeup Calls banner featuring a sunrise over the earth's horizon. glowing over the right hand side of the image, and the project logo in the left hand.

Turning NASA Wake-up Calls into data

For a while back then I was into space flight again. Scientists, science communicators, and engineers were all excited for a new era of rocket launches and the potential unification of the human race as we look towards the future.

During that time I discovered Colin Fries’ work in the NASA History Division to document all NASA “Wake-up calls”. A wake-up call is simply a piece of music used to wake astronauts on missions, a different piece of music, daily, for the duration of the flight.

Take, for example, the last Space Shuttle mission (Space Transportation System) STS-135; it was in flight for 13 days, and the wake-up call on day one was Coldplay’s Viva la Vida, while on day 13 it was Kate Smith singing God Bless America.

As a huge music buff who has the radio or music television on 18 hours a day, I really wanted to delve into this further. While Colin’s work is great, it’s just a PDF file (@wtfpdf). A PDF is not an ideal file format for querying data and gleaning new insights. So, while I wanted to explore it, I first decided to turn it into a true dataset. The result was a set of resources, a website, a JSON, a CSV, and an SQLite database which are each more functional and more maintainable over time.

Lets take a look at the results and https://nasawakeupcalls.github.io below!

All systems go!

I first discovered Fries’ work in 2017 when I was staying in Houston briefly (not a bad synchronicity!), but it wasn’t until January 2019 that I finally had time to commit to doing more with this work.

I created a GitHub organization and created a source repository to work on the data.

The primary question was: how do I extract information from the PDF as data?

The PDF was a mixture of narrative and table-like data formatted somewhat inconsistently across the years, and mixed with unstructured metadata about table content, e.g. describing a specific reason for playing a certain wake-up call.

Ideally I would extract missions, mission dates, song dates, song titles, song artists, and song comments. I knew I would work with Apache Tika to extract text, use Python to perform some data cleanup, and then the rest we would see.

Instinctively I also knew this was likely to follow the pattern of the Pareto Principle whereby I could get close to 80% of the results I wanted with just a small amount of effort, and then look more closely at the work that was remaining, potentially doing it by hand.

I’ve learned about the application of the Pareto Principle for software problems more concretely by watching Greg Young recently:

To paraphrase Greg:

How many iterations can I use to automate data extraction and transformation to cover 80%, 85%, or 90% of the work? Is the final 10% an acceptable amount of remaining work to do manually? Does the balance between the effort to automate and the reward for automating shift toward handling the remaining effort by hand?

Working through the data

With some additional iteration to understand different data cleanup problems, I created three scripts in total to take me close to a usable dataset.

After extracting the text using Apache Tika, the scripts take care of the following:

  1. extracting the text using Tika
  2. combine data into sentences using a tokenizer made available for the Natural Language Toolkit (NLTK).
  3. extract song information into semi-structured data describing mission, date, song, artist, metadata comment, and so on.
  4. convert semi-structured data to JSON, performing minor data transformation along the way, e.g. converting dates to ISO 8601 format.

The first JSON output gave me something I could continue to cleanup manually, at least, I hoped that I could!

Switching to manual

After writing the first three scripts it was time to start looking at the data as an actual dataset, albeit a rough first version.

Taking the data from v1 to v2 meant shaping it into something that was going to have long-term use. In the case of the final product I added a metadata envelope. I created more structured schema for the low-level data that would be consistent across each mission and so part of the remaining work would include remapping the JSON from version one into that schema and correcting issues as I went. The majority of the data issues falling under the following categories:

  • Fixing up date formats,
  • Correcting basic spelling errors,
  • Splitting data where it hadn’t been split correctly previously,
  • Making artist and song data consistent.

One set of date issues that was quite challenging were those around the Mars lander missions which were dated using SOL. I had to work these out relative to the mission’s start date. My recall is I that used a spreadsheet to calculate these and then transcribed them directly and so now we have ISO formatted dates for the Mars lander wake-up calls as well.

NB. Yes, the tradition continued for the landers!! 🤖 The Chronology has these snippets of text:

The function of the Wake-Up Music is as it would suggest, to ‘wake-up’ the mission team and get them focused on the days activities. As we will see the selection is often a little bit tongue-in-cheek. The music was originally played for the team around the time that the “sweep” was transmitted to the Rover shortly after it woke up. This sweeps the transmit frequency so the Rover’s receiver can lock onto it. This occurs at around 8.45AM local time (at the landing site). However this meant the song was being played before many of the team members came-in, so the time was subsequently shifted to 10AM local time. By this time most of the crew are on station and ready to begin the Sol’s activities. Sol is the term for a Martian ‘Day’. One Martian Sol is 39.5 minutes longer than one Earth Day.

NASA Gets Into the Groove! From Above Top Secret News Network, February 21, 2004

The eclectic playlist is Mars rover Spirit mission manager Mark Adler’s way of waking exhausted engineers and scientists who are working and sleeping on Mars time and dealing with a sometimes temperamental rover millions of miles away… Below is the Spirit playlist, along with some explanations by Adler for why the songs were chosen.

Mars Rocks! Eclectic Music Moves Rover Mission. By Robert Roy Britt February 26, 2004, Space.com and Jet Propulsion Laboratory Rover Daily Updates.

Houston we have data!

The first signs in the GitHub repository that things were ready to be published were around Feb 5. I remember having quite a lot of free time on the outskirts of Prague in early 2019, and it’s the type of work that is very easy for me to hyperfocus on.

The dataset wound up being a JSON, a CSV, and part of this that I needn’t go into too much detail, a website! The website was easy enough to generate automatically from a dataset like this using templates in Jekyll and results in a page per song with front-matter dated to the actual dates of the missions. A website also lends itself well to browsing, search, and randomizing, and so folks can visit the site, look up dates important to them, songs that are important to them, or simply ask for a random page to enjoy.

You can access each of the datasets here: nasawakeupcalls.github.io/data. I also list them in their repository form below.

Data sources

CSV

  • NASA Wake-up Calls: CSV.

JSON

  • NASA Wake-up Calls: JSON.

Datasette-lite

Sometime around February 2023 I was working with Datasette-lite and thought it’d be cool to turn the data into an SQLITE database for folks to explore that way. It’s always online for you to look at.

Website

As mentioned, the site can be searched, it can be browsed, it can return random pages. I also thought at the time (2019) it might be nice to have a Twitter integration, and an Amazon integration. The Amazon not so much for monetization, but for folks to be able find tangible products related to the artists and songs that they liked. Replacements for either would be great today if anyone has any ideas.

Twitter

Many of us close to digital humanities had a different relationship with Twitter in the past. Publishing there wasn’t about creating a dataset, but the end result, when viewed in aggregate through the lens of the Twitter API was very much a dataset. It previously enabled so much, but of course, we know this has since  been taken away.

Anyway. NASA Wake-up Calls had a Twitter account and you can look at that as well.

Reception

The website didn’t receive much attention but I have to thank Matt Allinson for their interest in the site. I think they wanted to do something similar but saw I had done lots of the work already. Matt added Discogs genre information from the discogs API which is pretty neat. I had hoped that one of the outcomes of this work might be people attempting to augment the data to bring new contexts.

Matt also told me they had interviewed a real life astronaut Tracy Caldwell Dyson (who also performed in Max Q who also had songs aired as Wake-up Calls!) and they had called the project “cool” 🤗.

I tried reaching out to other sources of space and science communication who might be interested in the work but I don’t think many if anyone has looked into it.

What about NASA?

I haven’t directly reached out to NASA. I have tried indirectly, but at the time it didn’t feel straightforward given the U.S. administration, and then, I moved on to other projects, really only coming back to this today while reflecting on some nice outputs from my past.

It would be great if NASA were to pick this effort up and make this dataset the foundation of their future efforts recording wake-up calls. It could make their task easier. It might also be a good foundation for uploading this as data somewhere else, e.g. Wikidata.

Next steps

So, at this point, I haven’t too many plans for the work. Maybe others have ideas that they’d like to see implemented?

Since February 5, 2019, I have added data where I have found information during my reading, listening, or viewing, such as when I managed to glean some more information about the first song played on the moon while watching a Quincy Jones documentary. I keep track of some of the other work still to be done via the GitHub Issues.

Outro: or should I say returning to earth?

Today, space feels like a terrible extension of the heavier sociopolitical issues that we’re going through (although who am I to talk? Scott-Heron saw deeper problems in 1970).

The human race is further apart from each other than ever before and not everyone has the luxury to think about space flight. There are an elite few taking it to the extremes, using space as their own playground whether using it as an extension of influence culture or to role-play as a rocket scientist (exactly how much money do you need to pay people to also pretend you are a rocket scientist?).

Maybe we will be able to enjoy space flight again one day.

Until then, maybe you can at least enjoy this musical lens on some of the achievements of the past.


Public Service Broadcasting

If you follow this blog but haven’t heard of Public Service Broadcasting, then please do check them out!

Public Service Broadcasting take creative commons audio material and weave them into concept albums. Previous albums have included the story of the Titanic, the Welsh mining strikes, and especially relevant to this blog, the race for space. More information about them and their album about space can be found on Wikipedia.


Harkive

I wanted to amplify the work of an archiving project I discovered thanks to simply taking the time to take on a project like NASA Wake-up Calls.

Harkive was a project by Dr Craig Hamilton that intended to archive a day of music and music listening habits each year from about 2013 to 2021/22.

Welcome to Harkive 2022. Thanks for visiting the website and for showing an interest in the project. This post should tell you everything you may need to know about the project. What is Harkive? Harkive is an annual online research project that gathers stories about How, Where and Why people listen to music across a single day.

You can do so simply by adding the hashtag to your music-related posts on Twitter, Instagram and Tumblr. Alternatively, if you want to write something a little longer, you can email it to us, or send it via this online form. Stories are also accepted as posts on the Harkive Facebook wall.

What a cool idea?! (oh, and there’s Twitter as a dataset again! 😉 )

It reminds me of an idea my colleague Andrew Fetherston wanted to follow up on to do the same within government agencies. Of course we don’t do that, but what a fascinating cross-section of information we might get!

Read more on Dr Hamilton’s blog.

And the archived project pages are available on the Internet Archive.


Miscellaneous

I don’t know if NASA Wake-up Calls is the internet’s number 1 most talked about subject, but folks do like them!

Some links I have referenced during my own research, and that you might find interesting as well!

 


Top 5 for this blog

I made a top five playlist for this blog. Enjoy!

Loading

2 thoughts on “Turning NASA Wake-up Calls into data

Leave a Reply

Your email address will not be published. Required fields are marked *

Follow

Get every new post delivered to your Inbox

Join other followers: