Finally, something to justify the existence of the Internet. The Google Ngram Viewer takes the corpus of words formed by the Google Books dataset (i.e. books, journals, magazines, but not newspapers) and lets you plot the changes in frequency of selected ones over time. There are all sorts of interesting questions you could (in principle) answer with this tool, so let's give it a whirl.
Here's a pretty basic one. Blue is aeroplane, red is airplane, the period is 1890-2000. (The smoothing in all these plots is 3 years.) Aeroplane was initially the more popular term, but airplane has predominated since about 1925. Note the peaks during the world wars -- airplane was 5 times more likely to be used in the Second World War than in the 1990s.
But we don't have to use the English corpus: there's also American English and British English. Here's the American version.
Okay, it's very much the same and so not very interesting. Aeroplane was more common initially but was replaced by airplane in the early 1920s. Here's the British version:
This is very much not the same. Aeroplane has easily been the more popular choice throughout the period, only succumbing to airplane in the late 1990s. So I now have an empirical basis for preferring aeroplane. (Incidentally, that the English and American English plots are so similar tells us that American English dominates Google's English corpus.)
Let's try something else. Here's the ngram for bomber, 1900-1950. (From now on I'll stick to British English.)
So it would seem that not many people were talking about bombers until the late 1930s and then bang! The war starts and everybody is. But that's to be expected. We want to (at least I do) filter out that peak and see if there is anything meaningful in the pre-1939 data.
So there are three distinct periods here. First an initial but low level of usage from 1905 to 1912. Then an upswing in the First World War. Then from a post-war low in about 1924 there is an (almost) continuous climb until 1939. The latter section however has an inflection point in about 1934 when the word frequency rises much more rapidly.
What historical reality to these correspond to? The first one is a bit puzzling. Bombers in 1905? That seems a bit early. But since the corpus is drawn from Google Books we can search that to get an idea of where the word is occurring and in what contexts. (There are links to Google Books at the bottom of each Google Ngrams search result which makes this easier, though you may need to modify the years searched by hand.) And here we start to see some problems. Searching Google Books for bomber between 1900 and 1910 yields such sources as Flight, the Official Year Book of the Commonwealth of Australia, the American Society of Mechanical Engineers, which seem respectable enough. Except the excerpts show that the particular issues in which bomber appears are clearly not from the period 1900-10. For example, the Australian Year Book excerpt begins 'Up till June, 1952 [...]'; Volume 5 of A History of the Azores Islands talks about the Battle of Britain. These journals and volumes are drawn into this search because they began publishing in the period 1900-10, not because they actually used the word bomber in that period. It seems that Google sometimes isn't able to index each issue or even volume separately, they're all lumped in together. This is not good. Being able to discard periodicals from the Ngram corpus would be a workaround.
Other than that, the explanation for the First World War bump is self-evident (though note that a bomber was also a soldier who 'bombed' enemy positions with grenades, and this sense would also appear in postwar memoirs and histories). Taking into account the 3-year smoothing, the postwar climb begins exactly where I would expect it, shortly after 1922 when P. R. C. Groves published his Times articles. The 1934 upswing is from the failure of the Disarmament Conference and the arrival of Hitler. So overall it makes sense.
Another phrase associated with the shadow of the bomber is poison gas. Here it is plotted over 1910-45:
Most of the discussion in the first peak would be about battlefield use of gas. I would expect much of the second peak to be about its use against civilians (it peaks in about 1938 which would be right). But there's no way to check correlations of terms (e.g. do they occur in the same page?) so I can only guess. Either way it's interesting that the phrase poison gas was used more often in the Second World War than in the First. But is it believable? It's hard to think so; it just wasn't something people would have written about as much. This suggests another bias in the sources of some kind. Perhaps more technical/medical journals are in Google Books for 1939-45 than for 1914-8 -- more places where doctors and engineers might discuss responses to poison gas in (unnecessary, as it turned out) anticipation of its use? A quick look in Google Books for 1942-5 bears this out: there's the Archives Of Otolaryngology and the Journal of Chemical Education. Also, of course, civil defence publications and things like Popular Science. But note again that these are American sources: I don't know if you can just search Google Books for British English books (but I'd love it if you can!)
We can also use the Ngram Viewer to revisit some old friends. Here's the rise of luftwaffe, 1933-45:
Well, it's not so much a rise as a plain. There's nothing at all. Aha -- it turns out that Ngram Viewer is case-sensitive. Putting in Luftwaffe instead of luftwaffe gives a much more sensible result:
Still, it's odd that searching on luftwaffe in the American English corpus yields a non-zero result. I can't believe that British writers and editors were uniformly strict about capitalisation where American ones weren't.
There are more oddities when we look at Coventrate: it does not appear at all in a British English search. Nor does it appear in an American English search. But it does appear in a simple English search! (The period is 1935-2000.)
Even if English is not simply American English plus British English, I can't see any reason why Coventrate would be used in other English variants and not the two major ones (especially since British English would have been the one to use it at all). So here is another glitch. Still, that before the war the graph is zero shows that the OCR is pretty good; I would have expected some confusion with concentrate but it doesn't look like that has happened.
Finally, to fulfil the promise of autogyros made in the post title. The rise and fall and rise and fall of autogyro between 1920 and 2000 is clearest in American English (read: I'm cherrypicking my data to fit the conclusion -- the second rise and fall is not at all evident in British English, and only just perceptible in English):
The initial rise and fall is the heyday of the autogyro, perhaps sustained after the war by hopes that it would fulfil hopes for practical personal aviation. But what is with the renewed popularity of autogyro from the mid-1970s to the mid-1990s?
Getting back to the Google Ngram Viewer, it's not actually quite so useful as to justify the existence of the Internet: there are currently too many problems with it for it to be a killer app. But it is only a test product and it is useful enough as it stands. It's certainly something to keep an eye on.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Permissions beyond the scope of this license may be available at http://airminded.org/copyright/.
Jakob
This is a great simple idea Brett - I'm going to have to steal it! I'm surprised to see 'airplane' beating out 'aeroplane' in contemporary British English. 'Airplane' grates on me, so I'm sensitive to spotting it in UK usage, and I haven't noticed it much.
As for autogyros, I can think of one reason popularity might have picked up in the late 60s/early 70s, but the data seems to lag slightly...
(As an aside - speaking of the fall of autogyros - the rotorcraft expert at IC always insisted that you'd never get him in one, having assisted on one accident investigation too many...)
Ross
Yeah I like this. Just put in Aerial Bombing Vs. Strategic Bombign and it comes out with an interesting change in the terminology.
It is interesting to not it does not pick anything up for Leigh-Mallory but if you put in Hugh Dowding it is a different matter.
Pingback:
The Rise and Fall of… « Thoughts on Military History
Chris Williams
'gyrodyne' shows that the Google copus is pretty granular in the 1920s, less so later on. 'Rotodyne' is a bit sad.
I started with 'policeman' and 'constable' as you might expect.
Errolwi
Interesting podcast with a gyrocopter instructor
http://www.flyingpodcast.co.uk/?p=75
The lag on the Bond is partially explained by the delay inherent in using 'books, journals, magazines, but not newspapers', plus 3-year averaging?
Ross
Brett have you plotted Airmindedness yet? Fits in nicely with what you have written. Proves the point as it were.
I should note that I have now re-done mine and if you remove the hyphen from Leigh-Mallory and do it as Leigh Mallory then you get hits.
Christopher
The search on bomber is very interesting and rather supports to some extent my earlier speculations that most people weren't too concerned about the bomber threat until later.
Dan
'Hun' and 'Prussian' 1910-1950 also makes for an interesting graph. Could spend hours thinking of things to put into this.
Brett Holman
Post authorSome other interesting searches -- 'Zeppelins,bombers,fighters', 'bombs', 'bomber will always get through'.
Note that not only is it case-sensitive, but 'bomber' gives different results to 'bombers'. Also as Ross notes, searches with hyphenated phrases fail, but if you replace the hyphen with a space it works. Not sure what that's all about.
Dan:
Yes, you could spend hours putting words in -- but should you? :) I do want to emphasise that there are serious problems with the Ngrams Tool. See Language Log and the Chronicle of Higher Ed for further discussion. But see also the Culturomics website (yes, that's what this 'new' field is supposed to be called) for the views of the creators of the Viewer.
Christopher:
Well, yes, if by 'later' you mean the later 1930s, which I don't think is news.
Roger Horky
'Aeroplane' was originally a four-syllable word when it was coined in the late 19th century--the initial 'a' and 'e' were pronounced separately, as a diaresis, refelecting the word's French roots ('aéroplane'). The transformation to the dipthong apparently occurred around the turn of the (20th) century.
Brett Holman
Post authorYes, you sometimes see it written 'aëroplane' in early sources, along with 'aëronautics' and the like (the 'steampunk diaeresis'). But to write it like that now would be taking affectation too far, even for me :)
Ross
Or you could write it as Airoplane. That is how it appear in Douglas Haig's diary.
Brett Holman
Post authorSo did Lord Robert Cecil in a proposal put forward at the League of Nations in 1922! I don't think this was ever a popular alternative, but the ngram viewer shows some odd peaks, eg as late as 1980. Small number statistics (or even typos/confusions between airplane/aeroplane), I would guess. http://t.co/5nv0pJ4
Pingback:
Query Online