First things first:

/assets/daft-punk-around-the-world.mp3?loop=1

This post is an attempt to visualize the travels of ships from the 1700s and 1800s, based on their own journey logs. The data that I used came from the CLIWOC project, a very interesting EU-funded project that created a climatological database based on logged weather observations from ships going around the world around this time. We’ll disregard the climate part of this completely and focus on what I find more interesting, mapping the ships’ travels as a way to broaden our historical understanding of these times.

Visualizing these travels was also a way for me to learn about data visualization techniques, specifically how to plot geographical data.

An overview of the dataset is available here. In a nutshell, it includes multiple logbook-style observations from ships that were sailing roughly between 1750 and 1850; each observation includes position data, which will be our primary focus, as well as other very interesting data such as ship nationality, departure and destination cities and notes on cargo being transported.

In total there are 280280 logged observations, 246214 after removing duplicates - entries for the same ship on the same day - and those without a VoyageIni field. Of those, 9.5% don’t have latitude or longitude measurements, and a few of them are also missing either departure or destination place, so cleaning for those leaves us with 226441 total observations from 4339 unique journeys.

The earliest journey available on record started on 15 january 1662, and the very latest one on 21 january 1855. The dataset includes logs from ships of 8 different nationalities, in the following proportion:

Distribution of number of logged travels by nationality.
Figure 1: Distribution of number of logged travels by nationality. [PNG]

There are a large number of logged travels per year until around 1800 followed by a sharp drop and slow climb back to pre-1800 levels. It’s hard to say whether this is a reflection of the historical context - were there really less ships active around this time? - or merely the result of lack of data, either because logs from that time didn’t survive to make it into this database or sailors decided to stop keeping logs for a while. But given that this steep decrease in active ships comes during the time of the Napoleon wars, where the 4 main sources of our logs - british, dutch, spanish and french - were heavily involved, it does seem to make sense that there would have been an actual policy shift resulting in less sea activity.

Distribution of number of logged travels by year.
Figure 2: Distribution of number of logged travels by year. [PNG]

Of course this dataset has a few problems - I find it incredible that it even exists, to be honest - that we should keep in mind. Three are worth pointing out:

  1. Latitude and longitude values are determined by multiple methods depending on the original records and will not always be very accurate. We always use the Lat3 and Lon3 columns, described as those “derived from the best position available”, but make no distinction between how those values were arrived at.
  2. Free-form fields such as CargoMemo, WarsAndFightsMemo etc are usually empty. In the particular case of the CargoMemo field, there are only entries from dutch logs (and a single funny british one).
  3. Locations usually have hard-to-recognize names, either because they’re written in foreign-language form or they’re historical names no longer in use. This is not really a problem, just means that we have to search a bit to find what on earth Batavia is.

Ports of the world

A first interesting look at this is to map all places referenced in these logs, available in the Geodata table.