The Library Basement
Reading under ground

Rigaudon: Polytonic Greek OCR

I came across a very exciting project recently: Rigaudon. This is a polytonic Greek OCR system which has already been used on 532 texts of antiquity. The result as CC-BY-SA licensed, and the code is GPL v2, and available in a git repo. Bruce Robertson, one of the collaborators behind the project, also has other repositories, including one for a web-based interactive syntax tree editor. Check them out.

Transcription is the great boundary between the source texts and boundless application in the digital realm. A good polytonic Greek OCR system will unlock many texts which have never been digitized. This has a dual benefit: a "clean" transcription process can lead to permissive licensing for public domain works, and as a result, we'll all have a lot more texts for research.

The system is not perfect, but it is a work in progress and improvements can be made. Nonetheless, some manual editing will be required. However, these OCR results are the best I have seen for polytonic Greek. And the potential reward is so vast, I cannot help but get excited and get involved. There is already some correspondence circulating about collaborating around a particular text, which could then lead to morphological tagging and syntactic analysis, and maybe more.

In 2011 I wrote that the future is brightfor copyright issues in Christianity. This is just one example of how that is so. Free software licenses for code and permissive licenses for content are becoming the norm in the cutting edge of the field. This is good for everyone, but there is still a lot of work to do (and maybe more than ever).