How many unique words are there in the Greek New Testament? Well, that depends on how you count.
I am doing some research and experimentation on indexing the Greek NT (or Koine Greek in general). One crucial aspect of indexing is to normalize the text so that potential search matches are not missed by the presence of punctuation, capitalization, contextual accentuation, etc.
At the same time there are some words which have the same normalized form which we should nonetheless count as different words, such as when morphology overlaps or different lemmas get inflected to the same forms.
So I set out to analyze the Greek NT and find how many unique instances of words there are. Namely, words are grouped if the share the same lemma, normal form, and parsing. To begin I used MorphGNT, which is based on SBLGNT. MorphGNT contains a column for the normal form of each word, as well as the parsing information, so it is just the ticket.
I used Python to find all unique instances of lemma, normal form, and parsing info. Then I used James Tauber's pyuca module to sort the results. You can find them in a compressed file here, sorted by lemma.
Using this methodology, I found 18,873 unique words in the Greek New Testament.
Here is a sample of the output:
ἅγιος ἁγίων A- ----GPM-
ἅγιος ἁγίων A- ----GPN-
ἅγιος ἁγιωτάτῃ A- ----DSFS
ἁγιότης ἁγιότητι N- ----DSF-
ἁγιότης ἁγιότητος N- ----GSF-
ἁγιωσύνη ἁγιωσύνῃ N- ----DSF-
ἁγιωσύνη ἁγιωσύνην N- ----ASF-
ἁγιωσύνη ἁγιωσύνης N- ----GSF-
ἀγκάλη ἀγκάλας N- ----APF-
Anyway, I hope to have more to share on this front later, but this just tickled my fancy.