I had been looking for a morphologically-tagged LXX for research and
came across the CATSS LXXM text. The one thing lacking for my use of
this text was that it was in betacode and not in unicode.
By searching I have found that many people have taken this text and
converted it to unicode for embedding in web sites, but to my knowledge
nobody is publishing the equivalent plain text files. The Unbound Bible
comes closest, but it publishes the text and the morphological analysis
in two separate files, which is suboptimal. So I decided to embark on
converting the LXXM to unicode.
Luckily James Tauber has shared a Greek betacode to unicode
script which took care of most of the hard work for me. Using this,
I was able to convert all of the texts to betacode to unicode. I am
sharing the result as a git archive: lxxmorph-unicode.
The texts differ from the originals in the following ways:
- Several corrections have been applied.
- The betacode text has been converted to unicode.
- The files are now whitespace-separated rather than fixed-width.
- The second column, containing the POS and parsing information, has had its whitespace replaced with hyphens in accordance with the above.
- Combined the split files of Genesis, Psalms, Isaiah, Jeremiah, and Ezekiel, and renumbered all the files.
Please note that this resource has a rather novel license which
requires users to fill out a user declaration and send it in to the
CCAT program at the University of Pennsylvania (see
0-user-declaration.txt in the repo). As far as I can tell, my redistribution of the unicode version complies with the license. I have contacted Robert Kraft (the former steward) and Bernard Taylor (the current steward) with the corrections I've found.
(link to the original announcement on the Open Scriptures mailing list)