Tools and methods

10 Comments

Sydney Gazette

Recently, the National Library of Australia opened up Australian Newspapers Beta to the public, free of charge (though whether free as in speech or free as in beer is unclear). This is part of the Australian Newspapers Digitisation Program and promises to be a fantastic resource. They are digitising newspapers from every state from 1803 (when the first newspaper was published in Australia -- see above) through to 1954 (after which everything is still under copyright). Both images and text are available, and it's very easy to zero in on particular articles of interest, by date, newspaper, category, state or keyword search. Indeed, the interface is very attractive. The articles can be downloaded as JPG or PDF (though whole pages can only be saved as the latter, for some reason). Bearing in mind that this is only a beta, there are some problems. Coverage is very limited (most of the First World War is available, for example, but only the end of the Second), and the OCRing looks pretty dodgy from what I've seen. But this is where it gets clever: users can correct the mistakes made by the OCR software, either anonymously (verifying sapience by way of captcha) or by signing up for an account. They can also leave comments on individual articles or tag them. Well done, that library.

Web 2.0 seems to be something Antipodean institutions are latching onto fast. The NLA is very up-to-date: it also runs Picture Australia, which aggregates a number of image archives and hooks into Flickr. The Powerhouse Museum in Sydney is also a bit of a web 2.0, well, powerhouse. The Australian War Memorial is not quite as advanced, perhaps, but it blogs, and I think there are things happening behind the scenes. And as I've previously noted, New Zealand has long been digitising its newspaper archives, with completely free access. It seems like comparable British institutions aren't embracing web 2.0 in the same way -- the British Library, the British Museum, the Imperial War Museum are all looking pretty staid in comparison. I'd love to be proven wrong though!

5 Comments

[Cross-posted at Revise and Dissent.]

I stumbled across this by accident: a pilot digitisation of Hansard, funded and operated by Parliament. What an excellent thing! It's functional, but based only on a subset of 20th-century Hansard material:

What's on this site? This site is generated from a sample of information from Hansard, the Official Report of Parliament. It is not a complete nor an official record. Material from this site should not be used as a reference to or cited as Hansard. The material on this site cannot be held to be authoritative.

This warning should be heeded -- it's only a prototype and should not be relied upon for any purpose. It's easy to find omissions, such as Baldwin's 'the bomber will always get through' speech, even though there's quite a number of entries for the day in question. The text itself appears remarkably uncorrupt, given the volume of data that's been OCRed: I've only found a few errors (most amusing one: the Marquees of Londonderry -- I guess it must rain there a lot). There are certainly a few minor problems -- for example, once I managed to get the search engine to tell me that a debate in 1958 happened earlier than one in 1944. At present there's no disambiguation between different people with the same name -- so the earliest utterance recorded for Mr. Winston Churchill is on 19 March 1941, and the latest on 11 March 1997 -- nor combinations between (possibly) the same person with different names -- such as Churchill, Mr. Churchill, Mr. Churchill (by private notice), Mr. Churchill (Stretford) and so on. It's all experimental at this stage, so these issues will presumably be addressed in future. (LibraryThing lets its users do a lot of the work for similar problems, but I doubt a HansardThing would ever reach the critical mass needed for that to work.)
...continue reading

6 Comments

For the last couple of weeks I've been using the free preview of Things, a task management application for OS X. I've just entered the final year of my PhD -- or rather the final year of my PhD scholarship, which may not be the same thing -- and so keeping track of everything I need to do is going to be critical. I've been looking for something like Things for ages, actually. Nearly all of this type of software seem to be based on Getting Things Done (GTD), a system for task management which is hugely popular, at least among techie types. But I've never been able to wrap my head around it, it seems too strict and hierarchical. The applications designed to help you follow it seem just as bad -- you're forced to fill in a bunch of text boxes or select from drop menus or whatever, and it's all just too annoying for me.

That's why I like Things, so far -- you can fill out as little or as much info as you want for each task. The organisation of tasks is logical (at least to me), the interface is polished but unobtrusive and the program lightweight. It just gets out of the way and lets you get on with things. Apparently it does actually conform to GTD principles, but doesn't force you to follow it if you don't want to. The data is stored in an XML file so you can retrieve it if something happens. Tags are used throughout, which is a nice touch. Tasks can be organised by time priority (eg 'Today', 'Next') or as part of a larger project. When you've completed a task, you tick a box on its pane and it will eventually vanish out of sight into a log of completed tasks. It's probably not the place for detailed notes (I use VoodooPad for that) but works well for jotting down things you need to do, when you think of them.

Things is only a time-crippled beta at the moment, but I've found it to be completely stable (there are features which aren't implemented yet, however, such as collaboration with other Things users). I'll almost certainly be buying the full version when it's released; but I have to say the price seems a little steep at US$49 for what, after all, is not a huge program. Being able to get things done is probably worth that; but I'd rather pay US$39, which is the price you can get it for if you sign up to their newsletter before 31 January (which I did a while back and haven't received a single email yet). Hopefully this doesn't sound like an ad (NB: I am not connected with Cultured Code in any way), but perhaps there are some Mac users out there who need task management as much as I do right now -- if so, Things is worth looking at.

2 Comments

I've just found the solution to a little LaTeX problem that has been bugging me for a while. To format my bibliography, I'm using the jox.bst (i.e. Oxford) style of the jurabib package. For the most part, this does exactly what I need it to do. But there are a few glitches. The most annoying one is when I have a BibTeX entry with a corporate author, for example War Office. Jurabib treats this as a personal name and so when it comes to alphabetically sorting the bibliography entries, it sorts on 'Office' and not on 'War'. This puts 'War Office' after 'Noel Baker' in the bibliography instead of after 'Turner', which is where it should be. (Yes, this is the sort of trivia you have to worry about when writing a thesis!)

Actually, that's not really the problem, or at least, it's one that all BibTeX styles share. There's a standard solution, though: put the author name in braces in the BibTeX entry: {War Office} instead of War Office. This tells BibTeX not to break the author name, to treat it as a single token. And jurabib does generally understand this -- but not if you use the jox.bst style! If you try to do this with jox.bst, you get an error like this:

! Argument of \jb@lbibitem has an extra }.
<inserted text>
                \par
l.1461 \bibitem[{{W}r Office}\jbdy {1922}}
                                          %
?
Runaway argument?

While it does eventually compile, it does so by mangling the bibliography, so that's not very useful. It would seem to be a bug in jurabib, or at least jox.bst -- and as of April 2007, jurabib is no longer under development.1 So it's not going to be fixed. Periodically, I've looked for a workaround (as have others), but nothing has worked for me2 -- until now.

The answer: enclose the spaces between the words of the corporate name in braces! So, War{ }Office instead of War Office. That's all there is to it, and it works perfectly. I don't understand why, but I don't much care either! My thanks go to Carsten Ziegert who posted this solution on the jurabib list.

  1. Its developer suggests biblatex as an alternative, though it seems that it's not yet stable. It does look powerful though; and I see that one historian is already using it. []
  2. Double quotation marks are also supposed to work, but don't. []

8 Comments

While I'm on the topic of Things to Come, I should correct a mistake I made in the talk I gave at the summer school. I said that Things to Come didn't do particularly well at the box office. I still haven't found any actual figures for that, but I've found what may be better, a ranking of its popularity out of all films shown in Britain in 1936. It turns out it was the 9th most popular film that year, out of over a hundred shown, so obviously it should actually be counted as a success. (Given that it was also an expensive film to make, it may not have turned much of a profit, if any, and that may have been what I was thinking of.)

This information comes from a very interesting exercise in quantitative history, John Sedgwick's Popular Filmgoing in 1930s Britain: A Choice of Pleasures (Exeter: University of Exeter Press, 2000). What Sedgwick did was take a sample of cinemas and go through their programmes to see how many weeks each feature film was shown for, and whether it had first or second billing, to be used as a weight. He also came up with a weighting for each cinema, based on its capacity to earn revenue (more seats and/or higher ticket prices means more weight). The number of weeks a film was shown for at a given cinema is then multiplied by the billing weight and the cinema weight, and this number was summed across all cinemas the film was shown at, to arrive at a popularity statistic, POPSTAT, for the film. Just in case that explanation failed to confuse you, here's the equation defining POPSTAT, from p. 71 of his book:1

POPSTAT equation

To the extent that POPSTAT actually means something, I suppose it is the potential total earnings of a film, and this in turn reflects the judgement of cinema managers as to whether cinema patrons would actually come to see the film, which in its turn would have been based upon how well the film was actually doing (ie, is it worth keeping it on for another week?) So in the end, assuming that cinema managers were responding to market forces, POPSTAT does indirectly measure something of a film's popularity.2 For the record, Things to Come has a POPSTAT of 40.65, just behind Fred Astaire and Ginger Rogers in Swing Time (40.95 -- so close as makes no difference) but comfortably ahead of the Dickens adaptation, A Tale of Two Cities (34.18). The most popular film of the year was Charlie Chaplin's Modern Times (83.26). Most films in the top 100 had POPSTATs in the teens. (The results for 1934-6 are actually online as an appendix to a seminar given by Sedgwick.)

And if you don't trust all that number-crunching, then here's one data point Sedgwick mentions, relating specifically to Things to Come: its run at the Leicester Square Theatre (where it premiered, as it happens) was 9 weeks, with the longest run for that cinema in 1932-7 being 11 weeks. So, I think it can safely be said that it wasn't a flop (contra me). I stand by my other point, however, which was that Things to Come is actually very singular, at least in British feature films: there are very few depictions of a city being turned to rubble by air attack, as in the clip in the previous post. In fact, I don't know of any. So however successful Things to Come actually was -- and it should be remembered that this may have been due more to the visually stunning scenes set in 2036 than the more depressing scenes set in 1940 -- it's not something film producers rushed out to emulate.

  1. You can create your own using a LaTeX-based generator. Try it, it's fun! []
  2. The exact numbers should be taken with a grain of salt -- I doubt four significant figures can be meaningful with such a dataset. One important caveat is the cinema sample. Not every cinema in Britain is used but only a selection of West End and first-run provincial cinemas. But unless films were markedly more popular in their second runs, I don't think this would matter too much. []

1 Comment

New Popular Edition Maps is an attempt to produce a copyright-free database of British postcodes. It does this by asking people to hunt around on a clickable, zoomable map of the UK for places for which they know the postcode (e.g. their home), and then enter that postcode at that spot. It's a bit like a stripped-down Google Maps; and you can search the map by placename or postcode. But what's interesting about this is that the maps used are out-of-copyright Ordnance Survey maps (1 mile to the inch) from the 1940s and early 1950s, which could be useful for historians or teachers, though these are obviously not the intended audience. Unfortunately Northern Ireland and most of Scotland is missing. (The National Library of Scotland has the OS maps of Scotland from the 1920s.)

Finding this inspired me to do a bit of a search for other online historical maps of Britain which similarly attempt to cover the whole country. (There's a useful list of out-of-copyright maps here.) Old-maps.co.uk has been around a while and uses OS maps from the late 19th century. Vision of Britain (which site has lots of historical statistics which you can slice various ways, and which I must explore more thoroughly one day) is more sophisticated, and has a neat trick of switching between different maps depending upon the zoom level: for example going from a 1921 large-scale map to a 1904 OS one to a NPE map. It also has 19th-century maps and a 1930s land utilisation map. But possibly the most interesting is Old Ordnance Survey Maps, which is based upon OS maps from the 1910s, 1920s, 1930s and 1940s. The coverage is very much incomplete; but it uses the Google Maps API, which means that it has a familiar interface for users, and could be used for mashups. It already overlays the regular Google Maps satellite and street maps. There are also handy links to take you to the same location at old-maps.co.uk and Vision of Britain. I can think of some improvements (for example, printing the publication date on each map) but this approach has tremendous potential.

7 Comments

Last year I was playing with a plotting program for Mac OS X, which was pretty good, but not quite satisfactory. I've found a better one, Plot, which is free (as in beer), fairly easy to use, and very customisable. It has its own idiosyncrasies, but I like it a lot. Here's an example plot, showing how the top speed of British combat increased up to the end of the Second World War.

Maximum speed of British combat aircraft, 1912-1945

The data are drawn from John W. R. Taylor, Combat Aircraft of the World From 1909 to the Present (New York: Paragon, 1979). This excludes aircraft which never saw service as well as those not intended for combat (though not all actually saw combat). The year is that in which it entered service (usually with the RAF), or if this wasn't given, the year when the prototype first flew. (Some aircraft unfortunately had neither, and so were omitted.) The maximum speeds, in miles per hour, are not necessarily comparable, because they were often obtained at different heights; also, they may not have been sustainable under normal conditions. But they should be broadly indicative of real-world maximums. I've classified each aircraft as either fighters (red) or bombers (blue), based upon their actual use. However, that's fairly arbitrary for the period up to 1915, which is when aircraft adapted for specialised roles began to appear. I haven't included seaplanes but I have included carrier-borne aircraft. Generally, I have only included data for the most representative version (eg not for each of the innumerable marks of Spitfire). Because of these caveats and inconsistencies, the plot should not be taken too seriously -- it's just for illustrative purposes.

...continue reading

3 Comments

zeppelin and hendon

My laptop is my primary workhorse, and I've just upgraded -- a very exciting time in any computer geek's life! On the left, my old 12" 1.0 Ghz G4 Powerbook, "zeppelin"; on the right, my new 13" 2.0 GHz Core Duo MacBook, "hendon". Zeppelin has been a rock-solid little machine for me these last couple of years, but it was starting to lose pace with my needs. Switching over to hendon been a very smooth process (other than getting Instiki to work again), and it's just so nice and fast -- it's better in every way (except for the size, I prefer the smaller formfactor). It should see me through me through the rest of the PhD in style.

33 Comments

A few months back, I posted about my decision to use LaTeX for writing my thesis, in preference to Word or something of that ilk. I seem to get a few Google hits from other people interested in using LaTeX in the humanities, so I will occasionally post useful things I've gleaned, even though it will be of no interest to most readers ...

So here's one. In theses (and monographs), historians generally separate their bibliographies into different sections for the different types of sources -- for example, "Primary sources" and "Secondary sources". It wasn't obvious to me how to do this in standard LaTeX/BibTeX, which just puts all of your references into a single bibliography.1 So, last night while procrastinating, I went looking for the answer and found it. There are several options listed there, but the only one I tried was the multibib package, and it works just fine for me.

It works like this: in the preamble, after calling the package,2 specify the name of each bibliography you need, along with a unique (and preferably, short) identifying key. For example, to make separate bibliographies for primary and for secondary sources, you might do the following:

usepackage{multibib}
newcites{pri}{Primary sources}
newcites{sec}{Secondary sources}

The newcites command takes the existing citation commands (eg cite) and redefines them so that there is an equivalent for each of your bibliographies (in this case, citepri and citesec). You then use these instead of the standard citation commands:

This is a sentence about a primary source.footcitepri{aston:1914} And this one refers to some specific pages in a secondary source.footcitesec[1-5]{bialer:1980}

newcites does the same thing for the bibliography commands, so at the end of the document (or wherever you want to place them) you would have something like this:

bibliographystylepri{jox.bst}
bibliographypri{all.bib}
bibliographystylesec{jox.bst}
bibliographysec{all.bib}

Then you run bibtex, as you would normally do, but now you have to run it once for each bibliography, eg:

% bibtex pri
% bibtex sec

Then latex it up again a couple of times to get the references right (again, as you normally would) and voila:

multibib

multibib
Shiny.

With a standard bibliography TeXShop can bibtex it for you, but it appears not to know about multibib, so you have to do it from the command line (not a big deal for me as I always have several Terminal windows open anyway). Apparently iTeXmac does do multibib, and a lot more besides, but for the moment I am happy with TeXShop so I haven't tried this yet.

More information about multibib can be found here.

  1. For that matter, I'm not sure how to do it in Word/Endnote either; I usually ended up cutting and pasting by hand. I'm sure there must be a better way! []
  2. Note that if you also use the jurabib package (and if you are writing in the humanities, you almost certainly are, or should be), you need to call that first, before multibib. []