After programming an elementary set of functions to count place names in a collection of texts, I have the results of some shrewdly automated counting!
Though when I say shrewdly, I mean it in the sense when referring to someone as being unsubstantively “clever”. Perhaps a sentiment shared by those at Stanford NLP working on the Named Entity Recognizer, extracting place names is a tedious business. One of the reasons why I refrained from using the tools they have provided was so that I could see what exactly that process entails.
There were two immediate lessons made clear to me:
1) Text analytics has a difficult time parsing synonyms. There are times when a sentence structure can provide enough context clues to inform a programs estimation of a word’s definition, but there are several instances in which this amounts to shrewd guesswork. For example, I noticed that several of the texts in my corpus contains the text Washington. Well wait, are we referring to: the State, the District of Columbia, or a person who has that name? I suspect that Stanford NLP uses typed dependencies, in some capacity to ascertain which meaning is invoked in their named entity recognizer, but I have not refined my counts to this extent yet.
2) It’s going to take a lot of time to develop a corpus that understands place in terms of metonymy. For example, several of the texts within the corpus will refer to national parks and geographical landmarks. While I would like to count these references as associated to a particular state/region, one would have to tag every instance of a place name and associate it with a particular location. This problem is probably one of degree rather than quality, but I would like to think that missing enough of these references will misinform computational endeavors to validate scholarly conceptions of regionalism (informed by opening a copy of The White Heron).
This is not to say that such experiments are not fruitful. After all, without running into these analytical issues, I would probably not have asked myself: is our conception of regionalism and place composed of metonymic structures, and how is this qualitatively different from an understanding of place derived from personification?
At the end of this, and as provided in an excel spreadsheet at the beginning of the post, I did manage to produce some counts — though largely misinformed. Furthermore, and where I sympathize with Wilkens the most, I’m not convinced I know what I’m looking at. This is, in part, due to the fact that I don’t use Excel very often and have little experience with the common courtesies of that form. Even if I did, or even if I plotted these on a graph, I am not sure I would know how these quantities translate into a geographic imagination.
What will follow in the next post is my early foray into ArcGIS. Now that I am sure I don’t know how to count.