The Library Basement
Reading under ground

Tag software

Moved some git repositories

I have become enamored of Gogs, a self-hosting solution for git repositories, so I've moved most of my personal repositories from a certain large centralized git service provider to my own instance. Check it out:

I understand this may require collaborates to actually use git in the manner in which it was designed - namely as decentralized version control. If you'd like to submit a patch to one of my projects, you'll need to craft a git pull request and email me.

On web analytics

Some time ago I began questioning the value of tracking viewership of this blog and my various other websites. Additionally I felt that tracking code was a bit invasive to my readers' privacy. So I disabled the Google Analytics tracking plugin and accepted that I would just be ignorant about my readership outside of comments. This sentiment coincides with a general trend of mine to stop relying on Google and other free service providers and roll my own services where possible. Yet I still wanted to know some general information about my visitors.

Enter Piwik, the free software analytics system. You can host it on a standard PHP/MySQL stack, so it is easy to roll your own. The data it collects stays with you. You can also configure it to provide better privacy for your readers, including anonymizing IP addresses and providing an opt-out feature. I installed it, loved it, and added it to all my sites. I am not storing the last two bytes of IP addresses, meaning the best I can do is narrow users down to a class B subnet.

So, yes, I am back on the analytics bandwagon.

Category: meta Tags: software

Jesus' vocabulary

A friend of mine asked if I had a list of all of Jesus' words, sorted by frequency, with common words like "the" removed. I did not have such a list at hand, but I took it as a challenge.

Thanks to software, most of the work to create a sorted listed of Jesus' vocabulary is trivial. I can easily make a frequency list of his words and remove common stopwords. The most challenging part for me was finding a source of the gospels from which it was easy to extract just Jesus' words.I asked around, and found that the World English Bible XML contains a \<wj> (i.e. "words of Jesus") tag which delimits exactly what I need. So after a bit of processing, and thanks to NLTK, I was able to provide a basic list of Jesus' most common words:

  1. one - 221
  2. father - 211
  3. tell - 210
  4. man - 196
  5. God - 163
  6. things - 163
  7. come - 158
  8. son - 149
  9. go - 123
  10. also - 113
  11. know - 111
  12. may - 111
  13. kingdom - 104
  14. see - 102
  15. lord - 97
  16. said - 96
  17. therefore - 94
  18. give - 93
  19. heaven - 86

Based on the top of the list, I'd say Jesus was primarily talking about the good news.

I've shared the code.

Bible software galore

We are experiencing a downpour of new Bible software offerings. Here are just three which have come to my attention lately:

  • Sofia (or Bible Web App v. 2) is an advanced web application which can work without internet access in any browser ([useful for restricted countries][]).
  • Verity, a desktop application for Windows and Linux.
  • MetaV ("the meta-version"), including a web UI called [MetaV Explorer][], which lets you browse the Bible by time and location.


Natural Language Processing with Python

I was browsing through a local bookshop's computer section recently and saw a title which instantly grabbed my attention: Natural Language Processing with Python. It was a bit more expensive than I wanted to pay at that moment, but I thought I may save up.

As it happily turns out, the entire book is available online under a Creative Commons license (BY-NC-ND). This is the sort of thing which makes me really happy. I am going to be checking it out, and if it is useful enough, I may buy to paper copy to thank the authors and O'Reilly for publishing such a great book.

The book is focused mostly on the Natural Language Tool Kit (nltk) Python module, which is available under an Apache license. I had never used it before, but it looks fairly capable. I must admit I was somewhat surprised that Google finds relatively few pertinent results when searching for "nltk new testament greek" or "nltk biblical studies." The library seems quite suited to the field, so I am surprised it is not more popular among Bible scholars. If nltk is any good, I intend to change that.