Tools and methods

You are currently browsing the archive for the Tools and methods category.

A while back, The National Archives made all Cabinet papers from 1915 to 1980 freely available for download. Now TNA Labs have created a visualisation tool for said papers, allowing you to see clouds of the 25 most frequent words and contributors for any year (month in wartime) or, using the 'flexible querying' mode, any period you specify (up to ten years). Mouse-overing each result gives the actual count and links to the relevant DocumentsOnline entries. It's something of a toy at the moment (though they encourage you to download the XML dataset it is based upon and play with it yourself). For blogging purposes, it's annoying that there's no export function: I've had to grab some screen shots to show the results. And it's not possible to search for specific words or change the stop word list. But the potential is easy to see.

Cabinet Minutes word frequency, 1931-1940

When looking at the lifetime of the National Government (1931-1940, spanning three prime ministers: Ramsay MacDonald, Stanley Baldwin, and Neville Chamberlain) one word inevitably caught my eye: air. At 1970 mentions over the decade, it's the fourth most common word after war (2537) , foreign (2125) and meeting (2059). Air could be used in a number of contexts, of course: the Secretary of State for the Air (a Cabinet position at this time) or Air Ministry, Royal Air Force, German air force, air routes, air raids, air raid precautions, air defence, air attack and so on. (I assume the tool is sophisticated enough to match only whole words and not just substrings.) But it suggests that the National Government spent a great deal of its time talking about the air, that it was, so to speak, airminded. (Naval, which admittedly has a somewhat narrower compass, is the only similar term and was used only 1204 times.)
Read the rest of this entry »

A tweet from William J. Turkel alerted me to the possibility of using 18th century-style fonts in LaTeX. The most noticeable difference from modern typesetting is the long s, but there are different ligatures too. There are a number of ways to do it but the easiest way is with the inbuilt Kepler Fonts package. (The Fell Types are far prettier, but look difficult, or at least tedious, to install. Font management is one of LaTeX's biggest weaknesses.) Just insert the following in your preamble and you're done:

\usepackage[fullveryoldstyle]{kpfonts}

Well, almost. This simply replaces every s with a long s, which is not right. Most importantly, long s is generally not used at the end of a word, so you need to replace these with 's='. Here's what the first paragraph of my thesis looks like when done this way:

I wish I'd known about this before submitting it.

Finally, something to justify the existence of the Internet. The Google Ngram Viewer takes the corpus of words formed by the Google Books dataset (i.e. books, journals, magazines, but not newspapers) and lets you plot the changes in frequency of selected ones over time. There are all sorts of interesting questions you could (in principle) answer with this tool, so let's give it a whirl.

aeroplane, airplane, 1890-2000

Here's a pretty basic one. Blue is aeroplane, red is airplane, the period is 1890-2000. (The smoothing in all these plots is 3 years.) Aeroplane was initially the more popular term, but airplane has predominated since about 1925. Note the peaks during the world wars -- airplane was 5 times more likely to be used in the Second World War than in the 1990s.

But we don't have to use the English corpus: there's also American English and British English. Here's the American version.
Read the rest of this entry »

[Cross-posted at Cliopatria.]

I know. Writing about Wikipedia is so 2006. And yes, finding errors in Wikipedia articles is not exactly difficult. But I have a bee in my bonnet which needs releasing.

Wikipedia's page on the Blitz has a section entitled 'Commencement on September 6'. This is how it currently reads (sans hyperlinks and superscripts):

There is a misconception that the Blitz started on September 7, 1940. Bombs began dropping the night of September 6 and continued for the full day of the 7th and on into the morning of the 8th. Saturday 7th was the first full day and has officially and erroneously become known as the day the Blitz started. Hermann Göring launched bombers and the first bombs caused damage the night of September 6.

Quoted in the The Manchester Guardian is Göring's communiqué:

Attacks of our Air Force on objectives of special military and economical value in London, which began during the night of September 6, were continued during the day and night of September 7 with exceptionally strong forces using bombs of the heaviest caliber.

A witness recalled the evening of Friday September 6, 1940:

My name is John Davey. I was born on December 27th 1924 in South Moltom [sic - Molton] Road, Custom House, West Ham, and a couple of miles from the Royal Docks. In September 1940, on the Friday evening of the weekend the docks were first blitzed, I was sitting with my friend in his house. At about 7 p.m. there was a series of explosions and the shattering of glass. We ran into the road and saw at the end a flame that shot into the sky, seeming to light up the whole area. My friend and I and lots of others ran towards the fire.
—BBC, WW2 People's War

The first damage to property on September 7 was recorded at eight minutes past midnight, a grocer’s shop at 43 Southwark Park Road, SE16.

It has long been the accepted, but erroneous, view that the London Blitz lasted 57 consecutive nights starting on September 7 1940 and ending November 1. In actuality September 6 makes 57 nights and not September 7. The historian AJP Taylor wrote of such an error:

… it is the fault of previous legends which have been repeated by historians without examination. These legends have a long life.

This is really quite silly. Yes, it's true that the accepted date of 7 September 1940 as the start of the London Blitz is a bit misleading, since there was a non-trivial amount of bombing before that date (e.g. see here). Judging from contemporary press accounts, 7 September certainly seemed to mark an important change in German bombing strategy, but more one of quantity than quality -- almost more an inflection point than a turning point. In retrospect we tend not to see it that way, which is fine. But we could recognise that -- leaving aside the eventual reification involved in the name 'the Blitz' itself -- the 'start of the Blitz' was less clearly defined then than it seems now.
Read the rest of this entry »

Reprisals: all mentions, 1939-1945

The word 'reprisals' popped up during my 1940 post-blogging quite frequently. After one post I had the idea of checking whether it could be used as an index of British attitudes towards the bombing of Germany throughout the rest of the war. The short answer is: not really. But it was still worth trying.

With The Times and the Manchester Guardian/Observer databases I can luckily do this in a semi-automated fashion. Automated because I can do keyword searches on the full text of the newspapers, semi because the interfaces are crude and require manually stepping through the date range to bin the data. For example, searching for the word 'reprisals' in The Times database between 1 and 31 July 1940 gives 16 articles; doing the same between 1 and 31 August 1940 gives 18 articles; and so on. I then put these numbers together and plot the results.
Read the rest of this entry »

I've put the series of posts I did a couple of years ago on the Sudeten crisis into one big PDF file called, rather grandiosely, Post-blogging the Sudeten Crisis: The British Press, August-October 1938 (147 pages, 5.6 Mb). It's freely available for download under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. It's very bloggy in style, but I've also added a basic index and put in internal links between the chapters (posts). My Sudeten posts are probably the best thing I've done with this blog, and they've been linked to from a few educational sites as well as Wikipedia. So by putting them into this format I hope they'll be made accessible to a wider audience. (I've been inspired in this by the work Evangeline Holland has been doing over at Edwardian Promenade.)

The conversion was done using a nifty tool called WPTEX. This is some PHP which hooks into WordPress's functions and reads out and formats your posts into LaTeX format. It didn't quite do what I wanted but with some PHP and LaTeX hackery I think it turned out pretty nice in the end.

This post is part of an experiment in post-blogging the Sudeten crisis of August-October 1938. See here for an introduction to the series, and here for a conclusion. The entire series can be downloaded in EPUB, MOBI or PDF format.

[Cross-posted at Cliopatria.]

Since graduating I've become what they call an 'independent scholar', meaning I currently have no academic job but still have the irrational desire to do research. I'd certainly like to be a dependent scholar, but it turns out they don't hand out jobs with your testamur. Who knew?

So there are things I need to do. One is to keep an eye out for jobs. In Australia, we don't have anything like the AHA interview-fests, which sounds like a slightly terrifying (if hopefully worthwhile) experience for recent/almost graduates. Nor does Britain, as far as I know. So job-hunting is presumably less seasonal. We do have the usual job search sites, such as UniJobs.com.au and jobs.ac.uk.

Once into the job application and interview process, one useful site to keep an eye on is the Academic Jobs Wiki, especially the history section. There are also places to share good and bad interview experiences, or simply to vent. The entries are mostly about North American universities, but it being a wiki there's no reason why that can't change.

The other thing to keep doing is writing and publishing. Part of that is knowing which journal to submit to, and part of that is knowing how long it takes for them to get your article through the review process. It's not something journals advertise on their websites (and understandably so), so the only data seems to be anecdotal. Which is why I was glad to stumble across the History Journal Response Times wiki. It might have saved me some grief had I known of it earlier!

Finally, an inspiring blog I recently discovered is Nicholas Evan Sarantakes' In the Service of Clio, which is aimed at providing advice to history graduate students on the subject of career management. It's all there, from choosing a university, to conference strategies, to having a life. For me, the best posts are the numerous guest blogs from people who got their PhDs and then got jobs, mostly outside traditional academia. So it can happen.

I'd be glad to know of any similar resources I might have missed.

One quite inadequate response to the paywalling of bibliographies is to set up your own, which I've made a start at here. It's a little narrower in focus than the RHS bibliography, being limited to works relating to the history of British aviation up to 1941 which I looked at in the course of my PhD research. However, it also includes primary sources. I'm still pruning it -- there might be some things in there which don't have 'significant' aviation content, for example.

It's running on WIKINDX, a content management system specifically designed with bibliographies in mind. (Thanks to Alun for the tip!) It was pretty easy to set up; most of the work I did was playing around with the templates and CSS to make it look a bit like Airminded. As a LaTeX user I was pleased to find that I could import my bibTeX bibliography files fairly painlessly, but if I was working in the Endnote world WIKINDX can handle that too. Just as importantly, it can export bibliographies in both formats, along with RTF, RIS and HTML. There plenty of other bells and whistles, including an integrated word processor which I can't ever see myself using.

There are a few different ways to view the database. One is to just list all the resources (i.e. books or articles), sorted by creator (author) or year, perhaps. Another is to browse the creators, which is done via a combined heat map and cloud. Or there's a quick search and a ridiculously capable power search. It talks to Zotero, and there's an RSS feed for recently-added resources. And so on.

What is the point of this? Is it going to be actually useful to anyone? Should I keep control of it myself, or open it up to others to edit? Should I be using citeulike or Mendeley instead? I don't know! But I'm already thinking about putting up a future war fiction bibliography ...

[Cross-posted at Cliopatria.]

The Royal Historical Society has for some years maintained an online bibliography of British and Irish history, updated three times a year. It currently has over 460,000 records. It's a fantastic resource for scholars interested in any aspect of the history of the British Isles, not least because it's free. But from 1 January 2010 it won't be: it will be rebranded as the Bibliography of British and Irish History which will be sold by Brepols, with subscriptions available for institutions and individuals.

This is a shame, of course. A resource which was freely available to anyone with an internet connection will now only be open to those who can afford to pay. Presumably that includes big universities and libraries (although even librarians at Yale, of all places, are complaining that digital resources are getting to expensive, according to this H-Albion post), but what about smaller universities, local libraries, schools, independent researchers? There is the individual subscription, but there's no information about pricing yet and it seems unlikely to be cheap.

The reason for this move is the end of government funding for the bibliography. That's understandable; the money has to come from somewhere. The fact that it has been funded by British taxpayers does raise the question of why a commercial entity should be allowed to profit from that expenditure. But as I'm not a British taxpayer it could equally well be asked why I should benefit from that expenditure. So I don't really have a basis for moral outrage here. It's just ... a shame.

But it seems to me that must be some other way to do this -- crowdsourcing, scraping, some combination of both? There are some sites which show the potential of crowdsourcing by way of people uploading and updating their own bibliographies, such as Librarything, or in a more academic context, CiteULike and Mendeley. Given a critical mass of users, a crowdsourced bibliography would be close to up to date. Scraping could be used to automatically feed in journal articles via RSS (books would be harder -- though maybe not). There are many difficulties inherent in such an approach, but I'd rather see something like this be the future than an ever-increasing array of paywalls.

[Cross-posted at Cliopatria.]

Recently, I followed Gavin Robinson's lead and tried out the British Library's EThOS beta. EThOS stands for Electronic Theses (dissertations) Online Service, and it's just what you'd expect from that -- an electronic thesis delivery service. There's not too much new about that, but EThOS does have some very impressive features. First is the scope: nearly all British Universities are participating (with two very major exceptions, unfortunately: Cambridge and Oxford). What's more, any thesis ever accepted in Britain is eligible for inclusion in the database, possibly going back to the 1600s, according to the FAQ. This could become a rich vein of primary source material for intellectual historians. Second is the fact that the theses have been OCRed, not just scanned. This means that you can do keyword searches on the PDFs, for example. Third is the fact that they are free! Mostly, anyway. If you only want an electronic copy, it's free (hardcopy costs, obviously). If the thesis you want hasn't been scanned yet, then you may be asked to contribute towards the cost of that, but in most cases, not. And it doesn't appear to matter whether you are in the UK or not (which is good, because I'm not).

As for the cryptic title above, one of the theses I downloaded was one I've long wanted to read but have never seen until now: Howard Roy Moon, The Invasion of the United Kingdom: Public Controversy and Official Planning 1888-1918 (London University, 1968). It's quite widely cited and I wondered why it hadn't been published. Now I know why: it's 735 pages long! I am suddenly feeling rather inadequate. Clearly, historians back then possessed superhuman powers. Or at least very strong arms, and hands adapted for furious typing and scribbling.

« Older entries § Newer entries »