Musings of a geoarchivist

All of the geoscience data, I go nom nom nom!

I sometime ponder what title my line of work should be grouped in. I often say “Earth scientist” because it is the most generic. But is there a better way to describe my work? I sometimes dabble in geophysics, examining the deformation of the Earth due to shifts of mass in the water on the surface. I dabble in glaciology, by looking at how ice sheets grown and evolve. I dabble in field geology by looking at coastal and glacial deposits. I dabble in geoscience model development to create the tools I need to make models of the geological past. I dabble in geochemistry by examining how changes in the chemistry of fossils and sediments record environmental fluctuations. I dabble in geochronology to see the order in which geological events happened. I dabble geomophology to deduce surficial processes. I even occasionally look at petrology to understand the relationship of different rock types and those processes.

I think the best way to describe my work is as a “geoarchivist”. I take a lot of information from different sources and put it together to make a broad picture overview of geological history. Creating a broad-based overview of the Earth’s history requires summarizing large volumes of information and knowing what to trust. This is perhaps best exemplified by a paper I published last year showing a reconstruction of paleotopography and ice sheets for the past 80,000 years. This paper, though it grabbed from my work on model development and geophysics, really is a result of years of reading papers and collecting data. To give you an idea, my papers folder currently has over 4000 papers in it! I read many papers each week, and remember details that I can use to improve my reconstructions. Truly, I like to “nom nom nom” on any data I can get my hands on. 😉

During my masters degree, a large chunk of what I needed to do was collect data related to the retreat of the southwestern Cordilleran Ice Sheet on the west coast of North America. This was my first experience doing a deep dive into searching the literature for data, and assessing their quality and how they pain a picture of the evolution of the ice sheet. I continued this in my PHD, and even published a paper on it. I also did deep dives into data related to sea level and lake level change, and figuring out the ideal ways to use the data to improve my models.

During my post-PHD work, I often strayed from this kind of archival work, with only middling success. I think the thing that has really turned things around was a chance meeting with Alessio Rovere, an expert in paleo-sea level. The first time I met him, I told him of my desire to work on sea level data archiving. As luck would have it, this was the very thing that he was also interested in.

Not long after our first meeting, I was given the opportunity to work with Alessio on this topic. He is the mastermind of a database known as the World Atlas of Last Interglacial Shorelines (WALIS). The last interglacial (roughly 120,000 years ago) was a period of time when global sea level was higher than present, so it is of interest to study what sea level might look like in the future. The intention of the database is to bring together proxies of sea level during that time in a standardized way. I was given the task to work on the southeastern South America component of this database, which was published last year in Earth System Science Data. This was a challenge, since much of the literature was published in Spanish and German, two languages I do not know. Still, it was a lot of fun tracking down some very obscure papers to include in the database. This is one of the benefits of this exercise, shining a light on papers that would otherwise remain out of view.

Later on, when discussing with fellow geoarchivist April Dalton on what would be a good way to collaborate, and I suggested we fill in a major gap in WALIS by archiving the sea level proxies in places that were covered by ice sheets prior to the last interglacial. This paper came out last month. This truly was a collaborative effort, with April working on putting together the paper and finding data sources that have last interglacial sea level proxies along with the other coauthors, while I entered the data into the database.

These data are essential to infer the size of the ice sheets prior to the last interglacial, which in turn is needed to infer the size of the remaining ice in Greenland and Antarctica. The eventual goal of the WALIS is to create a paleotopography reconstruction of the last interglacial, and figure out how sensitive the these ice sheets are to globally warmer temperatures. For me, this is the sea level data archive is the starting point, and you also must look at geomorphology and other climate proxy data to fully get a grasp on what the Earth’s surface looked like.

So what does it mean to be a geoarchivist? Although this term is not something that I am aware of being used to describe Earth scientists, I think there are many of us already. I think a geoarchivist should be considered to be the ultimate “big picture” scientist, and requires proficiency in a number of skills and being aware of relationships with other members of the community. Here are some thoughts:

  • A geoarchivist takes data from a large number of sources, and puts it together to make a coherent story.
  • They can assess which data are trustworthy, and make judgements on uncertainties.
  • Putting together a geological story is like putting together a 1000 piece puzzle with only 100 pieces, and 20 of those pieces might be from another puzzle. Geoarchivists require patience and an acknowledge the limitations on what we can known.
  • Standards have becoming much higher in recent years, but a geoarchivist needs to accept the limitations of legacy data and also make use of it when possible.
  • Geoarchivists must acknowledge the source of all data, including those that might not fit in your story. For example, a single radiocarbon date can cost hundreds of dollars/Euros, so it is important that the original source is credited to show your appreciation.
  • A geoarchivist should work towards making their data collections open source. I do this with my archive of sea level data. There are so many tools like Github that make this easy. There is nothing more frustrating than seeing a data collection, but then needing to manually type the data or digitize from a grainy scan of figure.
  • Working together with the wider community is a must. As a big jack-of-all-trades, there are many aspects to the data that are unknown to me, but are second nature to a more specialized Earth scientist.
  • Literature searching is one of the biggest skills a geoarchivist must learn. There are all kinds of papers being published every day that may have some relevance to what you do. I personally check Nature, Science, Quaternary Science Reviews, Boreas and Journal of Quaternary Science once a week to see if there is anything new that is relevant. I also get alerts from Google Scholar on potentially relevant papers, and check when a new paper cites one of my published papers. This doesn’t always capture everything, so I will occasionally do a keyword search and deeply dive into the results.
  • Government agency reports often do not come up in academic search engines, which is truly a shame. These often contain some of the most valuable data. For my work on the Laurentide Ice Sheet, I often check the Geological Survey of Canada publications database to see if anything new has shown up.
  • Sometimes the way you interpret the data may be different than the original interpretation. Tread lightly because not everyone will be happy with this.
  • One of the most important parts of geoarchiving is to put the data in a format that is easily accessible. For big geospatial datasets, the answer is obvious – NetCDF or shapefiles. Some data is better in a spreadsheet format. But I would warn against the exclusive use of Microsoft Excel formatted spreadsheets, that are often not readable in other programs. You can not go wrong with a text CSV file.
  • For a long time I resisted it, but I would say that learning Python is an essential skill for working with data. It allows to to easily do simple data analysis and manipulation. On that note, if you are creating a database, it is a good idea to also include the scripts and programs to use the dataset.
  • There are so many great visualization options, for instance in Python. I use Generic Mapping Tools, which is ideal for making nice looking maps, as well as making a large number of plots in a script. The maintainers of Generic Mapping Tools are also very wonderful about adding new features!
  • Inevitably, a geoarchivist will need to make use of the geographical information system program. I use QGIS often. One of the important things is to learn about map projections and how they will bias the data presentation. Although tempting, I think it is best to make the plots using another program rather than via the GIS software.
  • Always take the time to sit in on Earth science presentations in your department, even if it is not in your field of specialty. You never know what might be relevant. As an example, I remember sitting through so many talks on the dating of zircons when I was a PHD student, which seemed rather removed from what I was doing. But you know what, the information I gained from that helped me when I contributed to a paper on the sources of dust in northern China.

So, there are some of my thoughts on being a geoarchivist. It has been sometimes hard to fit this in with my other responsibilities (since there are few/no actual positions for people who exclusively archive data), but I do hope that I can eventually make this my main focus.