Throwing numbers at a map

Since I'll be undertaking a research trip to the UK this November or so, I need to think about exactly what I'm going to do there. Giving a paper at the AHA is part of that process. That will hopefully help me formulate my approach or at least identify potential approaches to comparing airship, spy and invasion scares in the First World War. But I also need to nail down where I am going to go in a very physical and literal sense. This is because I want to get out of London for at least a week, to look at scares in a provincial area, and raid the local archives for civil defence files or personal diaries and so on (which of course I can supplement in the London archives). This is partly because it'd be nice to avoid the London-centric perspective for change, but also because I suspect that such fears could be as or even more intense in outlying areas -- particularly on the eastern coast facing Germany. I had been thinking somewhere like Hull, which was raided by Zeppelins on multiple occasions, or East Anglia which is the closest part to Germany and so an obvious (at least in the folk sense) place for a German invasion or raid. Both areas also had notable phantom airship sightings in 1913. So maybe there. Or maybe somewhere else.

I wondered if it there was perhaps a systematic way of gauging fears along the invasion coast, something better than throwing darts at a map. And it occurred to me that I might be able to use the British Newspaper Archive (BNA) for this. We're all used to n-grams by now, which are great for tracking the varying usage of words over time. Tim Sherratt's QueryPic does this for Australian newspapers based on the Trove Newspapers corpus; though there's nothing similar for BNA that I know of, you can manually extract the data yourself without it getting too tedious. What I am thinking of might be termed an n-map: an n-gram across space instead of across time. It's a very obvious thing to do, but I don't think I've seen it done for the databases I'm used to using. It's really just GIS (without an actual map). Or distant (newspaper and map) reading.

There's no publicly-available BNA API to make it possible to do this in an automatic way, but again it is actually not too difficult to use the BNA interface manually. This is because BNA has a very fine level of geographic discrimination: all newspapers in the database are allocated a place (e.g. Hull), a county (e.g. East Riding of Yorkshire) and a region (e.g. Yorkshire and the Humber). These appear as filters when you do a search, and listed beside each filter is the number of issues the search has thrown up for it. So you can just copy down the numbers into a spreadsheet to construct your own low-tech n-map (or n-gram, for that matter).

So now the question is, what keywords do I use? This is not completely straightforward, though neither does it have to be airtight. This is just back-of-the-envelope stuff, after all. After some experimentation, I ended up going with 'zeppelin'; 'invasion'; and 'spy'. (BNA automatically searches on plurals as well.) Here are the number of articles in the BNA for each keyword for each region, for the period 4 August 1914 to 11 November 1918.

regionzeppelininvasionspy
Borders, Scotland10592103
East Midlands, England269912972657
East, England530395354
Grampian, Scotland271018403429
London, England204148
Lothian, Scotland661432968
North East, England156911641690
North West, England510434086854
South East, England629569656
South West, England477739604917
Strathclyde, Scotland224207349
Tayside, Scotland236116083849
West Midlands, England852247856552
Yorkshire and the Humber, England598830755575

You can click on the arrows at the top of each column to sort them numerically (or alphabetically, for the region column). Two regions are in the top three for each keyword, both England: the North West and the West Midlands.

But I'm only interested in local fears, and these words will appear in other contexts as well: Zeppelin raids on London, the invasion of Romania, the trial of Mata Hari. That is to say, in reference to actual Zeppelins, invasions and spies. So I decided to include another keyword with each search: 'rumour'. This hopefully will correlate more with more localised fears rather than real war news.

regionzeppelin+rumourinvasion+rumourspy+rumour
Borders, Scotland842
East Midlands, England198115121
East, England241724
Grampian, Scotland148127239
London, England012
Lothian, Scotland241518
North East, England10096133
North West, England304219284
South East, England192047
South West, England313299311
Strathclyde, Scotland272436
Tayside, Scotland114124185
West Midlands, England404305261
Yorkshire and the Humber, England288196185

Now the top three regions are the same for each keyword, again all in England: the North West and the West Midlands once more, and the South West.

However, these are absolute numbers of articles. This can be misleading. The more newspapers BNA has from a particular region, the more hits would be expected, in general (and vice versa). So the numbers need to be normalised against the total volume of newspapers in order to show which regions were relatively more interested in airship, invasion and spy rumours. Again this is easy to find out with BNA, just by doing an empty search. So here's the total number of issues in BNA for each region during the war, and then the ratio of the data from the previous table to that number for each keyword -- in other words, the average number of articles each keyword is mentioned in per issue.

regionissueszeppelin+rumour/issuesinvasion+rumour/issuesspy+rumour/issues
Borders, Scotland1700.050.020.01
East Midlands, England28680.070.040.04
East, England6900.030.020.03
Grampian, Scotland26760.060.050.09
London, England920.000.010.02
Lothian, Scotland7900.030.020.02
North East, England12220.080.080.11
North West, England50060.060.040.06
South East, England12030.020.020.04
South West, England60300.050.050.05
Strathclyde, Scotland2460.110.100.15
Tayside, Scotland28120.040.040.07
West Midlands, England69440.060.040.04
Yorkshire and the Humber, England51440.060.040.04

This changes the result dramatically. Now there are two regions in the top three for each keyword: Strathclyde in Scotland and the North East of England, or, in terms of major population centres, Glasgow and Newcastle.

Do I believe these results? BNA's wartime run for Strathclyde amounts to only 246 issues, comprising two newspaper titles. So it's possible that this high result is not representative of concerns among the people generally, it could due to be an idiosyncratic editor. It does seem surprising that there would be particularly grave concerns about invasion and so on, since this is the opposite end of Great Britain to Germany. (Then again, as a major shipbuilding centre it may have been felt that Glasgow was a prime target for German intrigues; and the 1913 phantom airship scare provides many examples of scares in non-obvious locations.) Newcastle is at least on the east coast, facing Germany, so that seems to make more sense. On the other hand, newspapers in the areas which I had initially picked as being most scared, so to speak, in BNA's terminology Yorkshire and the Humber, and East, don't seem to be particularly interested in rumours about Zeppelins, invasions or spies. In the case of East, or East Anglia, this may be because that area is underrepresented in BNA, which has only 690 issues from the war period from there, out of 36592 total, or 1.9%. According to the 1911 census, East Anglia had a population of 2.95 million; while all the areas covered by BNA in 1914-18 (which most notably exclude Ireland and Wales) had a population in 1911 of 33.8 million, which is 8.7%. So either East Anglia had very few newspapers relative to its population, or else it is just poorly represented in the BNA database.

In the end I'm not entirely convinced that I've solved my problem here, but I'll certainly be looking more closely at Newcastle. One interesting thing: the three keywords seem to correlate with each other. That is, where one appears more (or less) frequently, the other two tend to as well. That could be evidence to support my suggestion that the scares reinforced each other. Or maybe it's just a spurious correlation.

CC BY-NC-ND 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Permissions beyond the scope of this license may be available at http://airminded.org/copyright/.

3 thoughts on “Throwing numbers at a map

  1. Pingback:

Leave a Reply

Your email address will not be published. Required fields are marked *