Readings for March 2013

Better late than never.

Periodicals

  • Journal of Biblical Literature 131:2 – I cannot recall any standout articles, but I do have a general sense of being convinced by several and thinking relatively few were silly.
  • Scientific American November 2012 - R. Ewan Fordyce and Daniel T. Ksepka reveal some surprising history about the ancestors of modern penguins. My favorite tidbit: some were huge!

Zone One by Colson Whitehead

I first encountered Colson Whitehead when I found The Inuitionist at the Portland Wordstock book fair. I have read it twice since, along with all of Whitehead’s other novels, enjoying them all immensely. From time to time when I am at the bookstore I browse the “W” section to see if I am in for a surprise, and I sure was this last visit to Powell’s.

Colson Whitehead wrote a zombie novel. And it is good, very good. I have not read a whole of zombie lit, but I would hazard to say that Zone One is the best example of this current rash of zombie novels. He of course deals with the topic with some fresh perspectives on the zombie metaphor, but has some fun along the way as well. I laughed out loud fairly frequently whilst reading it.

So let’s just say that with Zone One, this latter day pop culture fixation on zombies is over. Whitehead has taken it to the zenith, and is therefore to be recommended.

Readings for February 2013

We welcomed our second son into the world this month. I celebrated with lighter reading.

“The Wasteland” and Other Poems by T.S. Eliot

In my occasional pursuit to read some of the classics of English-language literature, I picked up this slim volume from a local bookstore. I had not read T.S. Eliot before, so I was helped by some introductory materials. Yet I’ll admit that I probably read through the works a bit too quickly to have been considered a serious treatment.

As with most poetry I was somewhat engaged. I found Eliot’s early poetry in this volume to be more accessible. I am by no means dedicated to the silly proposition that “real poetry has meter.” However, after reading a lot of Eliot, I cannot help but be impressed by the poets of old who evoked strong feelings and did it in time. Though I guess with poetry of Eliot’s era, the medium is (part of) the message.

Periodicals

  • Harper’s January 2013 – I bet you thought Harper’s was too sophisticated to cover arena football. Well, you are wrong, and Nathaniel Rich’s article on the league is fantastic.
  • Scientific American October 2012 – Stephen S. Hall informs us of what some have guessed, that our DNA is not mostly “junk” and that scientists are working on discovering the purpose of the unknown bits.
  • Harper’s March 2013 – In some sort of disaster, I misplaced the February 2013 issue. Luckily this is the 21st century, and I’ll catch up online soon. In the mean time, Richard Manning’s article on the “fracking” boom in North Dakota is a good read, and comes at a time when exposes on the ills of fracking are quite prevalent (see National Geographic, et al).
  • Tin House #54 – I love the “Lost & Found” section, the review of old books. Alexander Chee introduced me to Julian May’s science fiction, which I may have to read.

A categorized, tagged Greek New Testament corpus

I have published a categorized, tagged Greek New Testament useful for natural language processing. I am calling it sblgnt-corpus. The text comes from the SBGNT and the morphological tags come from the MorphGNT project.

The text is broken up with one book per file. Each file has one or more categories (e.g. gospel and pauline). In the files there is one sentence (not verse) per line. Sentences are demarcated by punctuation . ; and ·. This makes it easy to tokenize sentences by splitting on newlines. Each word is accompanied by the morphological tag in the word/tag format (NLTK will automatically split word and tag on the slash). The part of speech tag is separated from the parsing information with a hyphen, which enables the use of the simplify tags function in NLTK.

Here is an example:

εὐθυμεῖ/V-3PAIS τις/RI-NSM ;/;
ψαλλέτω/V-3PADS ./.

Here follows an example of how to load this corpus into NLTK:

from nltk.corpus.reader import CategorizedTaggedCorpusReader

def simplify_tag(tag):
    try:
        if '-' in tag:
            tag = tag.split('-')[0]
        return tag
    except:
        return tag

sblgnt = CategorizedTaggedCorpusReader('sblgnt-corpus/', 
    '\d{2}-.*', encoding=u'utf8',
    tag_mapping_function=simplify_tag, 
    cat_file='cats.txt')

Now through the sblgnt object you have access to tagged words – sblgnt.tagged_words(), simplified tags – sblgnt.tagged_words(simplify_tags=True), tagged sentences – sblgnt.tagged_sents(), and textual categories – sblgnt.words(categories=’gospel’).

That should be enough to kickstart the exploration of the Greek New Testament with natural language processing.

 

Better tokenization of the SBLGNT

In my previous post on this topic I mentioned that the default NLTK tokenizer was erroneously treating elisions as separate tokens. They should be grouped with the word to which they are attached in my opinion. I decided today to look into this and fix the problem.

The SBLGNT uses unicode character 0×2019 (“right single quotation mark”) for elisions. The default tokenizer for the NLTK PlaintextCorpus is apparently the wordpunct_tokenize function. This uses the following regular expression for matching tokens:
\w+|[^\w\s]+

That essentially means: match any sequence of alphanumeric characters (\w+), or (|) any sequence comprised of neither alphanumeric characters nor whitespace ([^\w\s]+) – e.g. punctuation. The problem is that in Python’s implementation of unicode, 0×2019 is not considered an alphanumeric character, so it is getting tokenized on its own by the latter expression meant to catch punctuation.

So I crafted a new regular expression to alter this behavior:
\w+\u2019?|[^\w\s\u2019]+

So now for each sequence of alphanumeric characters, there can optionally be a 0×2019 at the end to catch elisions (I also explicitly exclude 0×2012 from the latter expression, though I am not entirely sure this is necessary). So now to actually use this:
tokens = nltk.tokenize.regexp.regexp_tokenize(text, u'\w+\u2019?|[^\w\s\u2019]+')

Using the custom regexp_tokenize function we can tokenize a text using any old regular expression our heart desires. I put a full example of this in the same repo with the name load-sblgnt.py. It should be run after the sblgnt-nltk.py script has run to download and prep the data. The load script provides an example workflow for getting an NLTK text object and then running collocations() and generate() as an example. Enjoy!

Dorner as the paragon of our violent culture

A cop who feels he was wrongly fired to cover up brutality in the LAPD goes on a murderous rampage, targeting cops and their loved ones in an act of revenge and to bring light to the corruption of the force. Sounds like a Hollywood plot, right? It is of course the true story of Christopher Dorner, which played out dramatically in the media earlier this month.

But in a way it is a Hollywood plot. A one-man army going outside the law to seek justice is a common trope in action flicks, and Dorner’s saga generated comparisons with Rambo and Falling Down, among others, in the media. He was the so-called “chaotic good” agent, doing what was necessary to confront the corrupt powers-that-be. So it was a tragedy that was almost bound to happen due to how our culture celebrates violence.

Clearly the LAPD and big-city police forces in general have an image problem. When the public was exposed to Dorner’s claim that he was fired in retaliation for reporting policy brutality, it was widely accepted as probable. People were commenting that for once, the madman’s manifesto actually made some sense.

In the course of the manhunt police lived up to the caricature, twice shooting at innocent people who happened to be driving pickup trucks, and deploying their increasingly-militarized arsenal against Dorner, including aerial drones. In the inevitable final shoot-out, Dorner took his own life rather than suffer the flames ignited by the police’s incendiary grenades.

With Dorner appealing to cultural hero narratives and the police fulfilling a cartoonish expectation of brutality, it was no surprise that we started seeing the following headline: “Dorner has supporters in social media.” That is, many people had come to root for Dorner and were expressing those sentiments in public on the internet. Now some people I think were just expressing sympathy for Dorner’s firing, saying that they find his totally-believable story to be credible. But still others seemed to support the rampage itself.

Dorner was the worst sort of criminal – a cold-blooded killer. His attacks targeted not only police officers, but their family members as well. So there should be no respect for his actions, whatsoever. To me it is insane to think that a shooting rampage is a just protest against policy brutality. I know many, if not most people in the US would agree with that.

Yet in our culture, violence is portrayed as the ultimate embodiment of justice. In Hollywood works, and law enforcement, and politics, and foreign policy, it is the redemptive force which brings about good in the end. In so many cases it is the climactic gunshot or fist fight or cruise missile which wraps the story and gives closure to the plot.

So when the socially legitimate violence of the police is undermined, I am not surprised that some people would view Dorner’s violence as justified. After all, violence is necessary to achieve good, and if the police are abusing it, somebody should set things right with a gun, right?

Of course not. Escalating police violence is a real problem in this country. We need to decrease the militarized nature of the police, lower overall violence, and increase consequences for the improper use of force. But those need to be achieved through peaceful and lawful means, not through a psychotic rampage. If Dorner has any legacy, it should be to show that our culture is too sympathetic to violence, and that this needs to be corrected.

Prep the SBLGNT for use as an NLTK corpus

The SBLGNT is available as a plain-text download, which is my personal favorite format for text processing. I have been wanting to put the SBLGNT into a Natural Language Toolkit corpus for ease in text processing for quite some time, and decided to get around to it yesterday.

First of all, the plain text of the SBLGNT has a few undesirable features for this task. First, each verse is prefixed with the verse number and the tab character, which is great for many applications but not for corpus linguistics. Second, the text contains Windows-style linebreaks and other extraneous whitespace. Third, the text contains text-critical signs.

So I wrote a script to download the plaintext archive, extract the text, and normalize it for use in NLTK.

Fir download and extract or checkout the repo. To install requirements:

$ pip install -r requirements.txt

Next, run the script:

$ python sblgnt-nltk.py

Now you have a collection of text files, one for each book of the New Testament, in a directory called “out”. You can know use these with NLTK. For example:

>>> import nltk
>>> sblgnt = nltk.corpus.PlaintextCorpusReader('out','.*',encoding='utf-8')
>>> sblgnt_text = nltk.text.Text([w.encode('utf-8') for w in sblgnt.words()])

You end up with sblgnt as an NLTK corpus object and sblgnt_text as an NLTK text object. You can refer to the NLTK documentation for the various uses of these. Please take note of the encodings. If you don’t pay attention, you’ll get lots of encoding errors when working with a unicode text and NLTK.

One thing you can do is run the collocations method on sblgnt_text:
>>> sblgnt_text.collocations()
Building collocations list
τοῦ θεοῦ; ἐν τῷ; ἀλλ ’; ἐν τῇ; ὁ
Ἰησοῦς; δι ’; ἐπ ’; ὁ θεὸς; μετ ’;
εἰς τὴν; ἀπ ’; τῆς γῆς; λέγω ὑμῖν;
Ἰησοῦ Χριστοῦ; ἐκ τοῦ; τῷ θεῷ; τοῦ
κυρίου; κατ ’; εἰς τὸ; οὐκ ἔστιν

I’ll have to look into tweaking the NLTK tokenizer, because, as you can see, it is treating elisions as tokens, which may or may not be grammatically correct (I’ll have to think about that and ask around). Another cool trick, the generate method:

>>> sblgnt_text.generate(50)
Building ngram index...
ΠΡΟΣ ΚΟΡΙΝΘΙΟΥΣ Α Παῦλος ἀπόστολος
Χριστοῦ Ἰησοῦ καὶ τοῖς βουνοῖς ·
Καλύψατε ἡμᾶς · πολλοὶ ἐλεύσονται
ἐπὶ τῷ λόγῳ διὰ τῆς στενῆς θύρας ,
ὅτι τὸ μωρὸν τοῦ θεοῦ . Καὶ ἐγένετο
ἐν τῷ βυθῷ πεποίηκα · ὁδοιπορίαις
πολλάκις , ἐν κόποις , ἐλπίδα δὲ
ἔχοντες αὐξανομένης τῆς πίστεως ,

So that’s that. At some point I’ll attempt to make a tagged text based on the MorphGNT (which is being re-based off SBLGNT).

A review of Harper’s classified ads

Harper’s Magazine is the second-oldest continuously published magazine in the United States. Its combination of political commentary, general reporting, poetry, short fiction, art, and book reviews is enjoyed by just north of 200,000 subscribers. Based on the advertizing in its pages I surmise that the readership is generally older adults, wealthy, well-educated and intellectual, and mostly in the northeast. And it is this demographic which makes the contents of its classified ads so mystifying and amusing.

The Predictable

There are a few ads which meet my expectations about what sort should be in Harper’s. Take for example an ad for the international tea importing business. This strikes me as right up the readership’s alley, and it has been present in the classifieds as long as I have subscribed. Strangely the same ad actually appears twice in the current issue, in the outer columns of facing pages, such that when the page is closed, the ads would almost touch.

Then you have the ad for “holistic organic” skin care products – exactly what older folks with disposable income are supposed to be buying. And the ad for the “documentaries on demand” service, which confusingly places a red heart glyph () in the printed URL.

The final ad in this category is another long-standing one – “European Beret $14.” Just what every American intellectual needs! My favorite part of this one is the rather goofy accompanying picture of middle age man articulating a point whilst wearing a beret. Now is the picture of a genuine European, or meant to convey what an American who purchases such a hat can achieve? As it happens the vendor for these berets is within walking distance of my work, so I may have to stop by for a fitting.

Aspiring Writers

There are a couple of ads for writers. As it happens, this is a rather new trend in Harper’s classifieds.

One introduces the reader to book one of a trilogy, and confusingly (there’s that word again) asks the reader to “buy it and book 3.” I guess book two is dispensable.

And a very audience-aware ad seeks a “literary patron from the 1%.” I assure you that no classified ad could more obtusely attempt to capture the zeitgeist of Harper’s Magazine than this one. I wish the author well.

“Romance”

Would you like some “unorthodox” reading material? Or perhaps something “tasteful”? These ads are a mainstay. And for whatever reason, their publishers all seem to be based in New Jersey.

Moving to the higher brow, there is an ad for Quaker dating website, but it appears that the core value shared is caring about “social issues” rather than religion. And then there is the website which allows you to “date accomplished people” who have attended certain prestigious universities. Who said anything about this being a classless society?

Pseudoscience

This next ad actually blurs the lines between pseudoscience and “romance.” Is your love life struggling? Try pheromones. Just check this testimonial:

My wife completely changed her reaction to me. Where before she was straining to be affectionate now she is flirty . . . I can hardly believe she is the same woman.

There’s nothing like subconscious biological coercion to sweeten a relationship! I am not sure what editorial criteria there are for classified ads in Harper’s, but in my opinion this one should be excluded on the grounds that it is either bunk, or (if it really works) disturbing.

There’s also a paranoid sounding ad about “water scams” and some crazy tripe about forecasting future events. Really, is Harper’s that hard up for a few dollars?

Conclusion

Honestly the Harper’s readership does not seem like it would be interested in European hats pseudoscience or dating websites. But as I mentioned, some of these ads are long-standing, so I can only imagine that they work. And perhaps this is an indictment of the Harper’s Magazine readership. No matter how sophisticated we perceive ourselves to be, we are still in the market for silliness.

Readings for January 2013

This month saw me finish up a Sanderson series and knock out a few more periodicals.

The Hero of Ages by Brandon Sanderson

This is of course the final installment in Sanderson’s Mistborn triology. Like all of his novels, I found this one to be an engaging, quick read. Just the thing for when you want a fix of good fantasy. Add to this that it is the series finale, and you’ve got a real page turner on your hands.

I will offer this one critique: Sanderson’s plots in this installment seem a bit overwrought. There is a lot of complexity in the story, and though the series comes to a satisfying conclusion, I feel it could have done so without so much extra expository effort on Mr. Sanderson’s part. Still recommended for any diehard Sanderson fans.

The Alloy of Law by Brandon Sanderson

I could hardly resist the novelty of a short Sanderson novel. All of his other novels I am aware of weigh in at at least 600 pages in paperback, so these 325 pages seem slim. And since The Alloy of Law follows in the Mistborn universe, it seemed an appropriate follow-up to The Hero of Ages. I was not disappointed.

Sanderson does a good job refreshing the magical lore of the previous series by imposing some changes on the magic system itself, as well as introducing a new technological milieu (read: guns). I also found that it pulled off the steampunk feeling without being overly self-conscious.

Somehow I got the impression that this was a “stand-alone” novel, but the book definitely sets the reader up for subsequent installments. Rather than a short novel, it might be Sanderson’s long prologue to a new series in the Mistborn universe. I am not sure where that all will fit in to the author’s writing schedule, since he seems to have quite a few novels in the hopper from other series. Recommended, but maybe wait for the other shoe to drop.

Periodicals

  • Scientific American, September 2012: If you have a sleepwalking spouse and would like to be unsettled, read James Vlahos’ account of sleep crime. It is a very fascinating read on the neurology of shut-eye.
  • Harper’s December 2012: The short story “Christmas Party” by Russell Banks is quite simply the best I have read lately. The author really got my pulse running and my heart engaged in the short format, and that is a rare feat.

2600, vol 29, no. 4

I have long been fascinated with 2600, the Hacker Quarterly (read here if you need a remedial lesson in the classical meaning of the word “hacker”). Incredibly it is carried on the newsstand of a major national chain bookstore. After flirting with it a few times in the past, I finally bought one while my wife attended a book signing of a local author and friend.

Overall I was disappointed with 2600. The information was just plain old. Tor, openvpn, ssh tunneling, and proxy servers? Old news. There were a few fascinating nuggets, but for the most part, if you want to learn about new computer security technologies, look elsewhere.

One of my favorite quirks is the letters section. First, you have to love a magazine whose three personal ads are all published by incarcerated people looking for pen pals or debate partners. Second, the letters section is the largest in the magazine, making up for maybe 40% of the pages. Third, all kinds of zany topics are covered, because they have a fairly loose editorial policy (in keeping with the hacker spirit).

It is what it is. I hear that 2600 meetups are fun. But I won’t be buying any more issues of the magazine.

How we need science fiction

As an information technology worker I can assure you that my field completely lacks a sense of metaphor. Technology is amazing and advancing, but there’s just nothing more there. We never have a sense of wonder when working with computer systems. More than that, science and technology tend to be destructive of our traditional metaphors (e.g., we can no longer say that Jesus is “light from light”, since we now know that light is comprised of photons).

So here is part of why I appreciate science fiction so much, especially when it deals with artificial intelligence and related topics: it injects the metaphor back into technology. It helps us to find something deeper to ponder in the technology surrounding us. And it helps us to consider the trade-offs of technology by pushing the current trends to their limits. As a result, I find the dismissal of sci-fi as “genre fiction” by literary critics to be silly and short-sighted.

It is best not to question bumper stickers

If you live in the United States, you may have seen this bumper sticker:

What does that even mean?