The Library Basement
Reading under ground

Strong's Dictionary in sqlite3

Someone asked the Open Scriptures mailing list about getting the Strong's Dictionary data into a sqlite3 database. Challenge accepted. And it was quite the challenge.

The Strong's repo for the Open Scriptures project contains an xhtml version of the Strong's dictionary. I would have used that data as a source, but for two problems: 1. it lacked transliterations, and 2. some of the unicode lemmas for the Hebrew portion were missing. Thankfully the repo also contains the XML sources for the Greek and Hebrew. I decided to unleash Python with xml.sax.

Unfortunately those XML sources were two different data types, so I had to write two different parsers. Also, the Greek portion contains self references with just the number, not the unicode string, so I had to write a second pass parser to fill in the missing lemmas. It also turns out that some of these self references are to Strong's numbers which are not a part of the dataset, which has me a bit perplexed (I'll be following up on that soon).

After changing my mind a few times about how I wanted to approach the "description" part of each entry (and some accompanying refactors of the code), I finally got a working product. You can find it in my Biblical Studies git repo. I put it under the MIT license so people can do whatever they need to do with it.

I am not really sure if there are any other open sources Strong's->sql importers out there. Maybe someone can take my script and give it support for other databases (or even frameworks, like Django).

Edit: And Darrell Smith provided code for doing it with regex in PHP. Technology can provide many paths. Glad to see there are so many helpers on the Open Scriptures mailing list.

Update: I've update the script to use 1.5 of the Strongs Greek XML, and it also downloads the source files automatically, so you don't have to checkout the Open Scriptures git repo if you don't need it otherwise.

Update 2 (March 10, 2012): The MorphGNT site was moved to Github, so I've updated the link to the Strongs Greek database in the script. Also, here is a compressed copy of the sqlite3 database which results from the script.

Categories

Tags