Category Archives: language

The lack of adoption of Unicode in Biblical studies

It is 2012, people. It is unacceptable to not use Unicode Greek and Hebrew in your publications. If you are still resorting to transliteration (PDF warning), you need to get that fixed. There are many resources on the web regarding how to get started with Unicode Greek and Hebrew. I am willing to help myself.

(The linked article is a great review of interpretations of “baptism for the dead” in 1 Corinthians 15 by Joel R. White. I recommend it. I am not sure whether or not it was the author or editor who was the source of the transliteration.)

Fun with Subtitles

The Natural Language Processing course by Dan Jurafsky and Chris Manning has commenced. The class is comprised of readings, video lectures, problems, and code examples. I have been working through the video lectures and was pleased to see that each one has English subtitles. This being a natural language processing course, I decided it was obligatory that I process these subtitles.

So I decided to download each subtitle file, concatenate them, normalize the text, tokenize the text, and remove stopwords. I take the resulting list and sort by the frequency of occurrence. Here’s the top 50.

word – 113
we’re – 107
one – 86
two – 78
like – 76
words – 75
distance – 65
let’s – 57
gonna – 56
it’s – 56
we’ll – 54
that’s – 49
string – 48
there’s – 47
look – 46
end – 40
sentence – 40
example – 39
things – 39
inaudible – 38
might – 36
use – 36
going – 35
cost – 34
here’s – 34
capital – 31
kind – 31
match – 31
algorithm – 30
see – 30
alignment – 29
could – 29
get – 29
text – 28
three – 28
e – 27
n – 27
regular – 27
different – 26
processing – 26
strings – 26
period – 25
case – 24
character – 24
language – 24
little – 24
characters – 23
means – 23
sound – 23
us – 23

At first I was thrown off by the presence of “e” and “n” in the list, thinking I had a bug in my tokenizer. But it turns out that the instructors say many individual letters in the course of their discussions. Comments or email with feedback are appreciated.

Biblical languages reception debrief

This past week my alma mater Multnomah University celebrated its 75th anniversary. As part of the festivities several receptions (or reunions) were planned. Most of these meetings were arranged by class year, but the big exception was the one for Biblical languages. I decided that the prospect of a meet-up with faculty, classmates, and current students was a can’t-miss event, so I even delayed the start of my vacation to attend.

I arrived at the reception room a bit before the scheduled start time. The room was empty and the lights were off. However people started streaming in shortly thereafter. I was pleased when I recognized many of the people coming in the door. Overall there were four faculty members (under three of whom I studied), three alumni including myself, and perhaps a dozen current students.

It felt good to be recognized by old faculty, and apparently my thesis Short Goliath is still remembered. I was saddened that the almuni turnout was so low, but not particularly surprised. The faculty asked us to share a bit about the post-college experience of Biblical languages with the students. I recommended that folks get involved with online communities centered on Greek and Hebrew.

After the brief introductions, we split up to socialize. I think our group had the most fun of all he reunions. The Biblical Languages room was the most full, and certainly the loudest (with laughter ringing out almost non-stop). It was fun, it felt like family. The shared experience of learning languages leads to a lot of laughter and boding. I am glad I went.

Free Stanford Natural Language Processing class

Stanford is offering several free online courses starting this month. Of particular interest is the Natural Language Processing class. I’ll be making time to participate.

Vocabulary Analysis

While reading Ehrman’s Jesus, Interrupted I got the idea to look in to vocabulary studies. You know, the ones where linguists catalog all the words used by a particular author and use the data to compare various works by (or purportedly by) that author. Does anyone know of a publication which lays out the basic methodology for doing this? I might try to write a script to help with the first step. I’m also interested in applying these methodologies outside of the biblical texts to see what they might yield. For example, how much does the vocabulary base of authors change across genre, time, etc.? I’ve never read anything which attempts such a study.

Verbosity in translation

In the midst of all the recent discussion of varying “translation philosophies,” I came across an article by Karen H. Jobes on bilingual quotation (like what they do at the UN). The article itself is very interesting. Toward the end there is a discussion on verbosity in Bible translation. The bottom line is that a certain popular translation is more verbose than another certain popular translation. Jobes insists she is not trying to say anything bad about the first or good about the second, just that word counts are not necessarily a good indicator for the “literalness” of translation. I agree on the latter point. Thomas did an interesting workup to show how verbosity correlates to the spectrum of translations. The bottom line: it doesn’t, just as Jobes said. More “dynamic” approaches can lead to more or fewer words, it seems. Perhaps the whole point is moot, because I don’t know of anyone who counts isomorphism as a positive characteristic in translations

Midwest informant

An anecdote of gender-neural idiom in the mouth of my aunt from Indiana:

So-and-so called to let us know that their husband got a job.

Notice how she used the generic “their” pronoun even though the person was obviously female in context (“husband”).

Committee translations

All of the principle translations of the Bible into English are done by translation committees. There are some notable exceptions, of course, including the Message, the Living Bible, Weymouth, Philips, etc. However, these are considered secondary due to their being completed by an individual. Moreover, I have sensed a general sentiment that such translations by individuals are not considered as trustworthy as committee translations because it is thought that personal bias would be allowed to shine through in an individual’s work. A committee is also useful for imposing a standard style on a work which is actually composed by many different scholars. I am sure there are more reasons for the committee trend. What I began noticing in college is that committee translations are pretty rare outside of the Bible (I cannot think of one off the top of my head). Instead, translations are typically done by an individual or small team. This is the case for works of antiquity, textbooks, novels, etc. Why is it that committees are so common for biblical translation but so uncommon for everything else? Is the quality of the end product affected by the decision to use a committee or not?