December15

Ruby Natural Language Parser

I have been working on a little side project which is a MUD (multi user dungeon) – mostly because I really love the text based adventure game concept. For me it’s a bit like reading a book compared to watching a movie. The overall feeling is more immersive and lets the imagination play a greater role in the experience – for me at least.

One of the key areas in a MUD is dealing with textual input from the player and then parsing that for the commands the player wants to perform. There is a well known mud project that has been going on for many years in the ruby world called faeriemud – which several people seem to have been working on bit by bit and Michael Granger one of the main developers must have also come across the need to parse and make sense of text from a player. Which is why he has created ruby linguistics – which basically is a set of string functions that can do things like find plurals from non plurals and many other things.

One of the things it does that I was interested in – is 2 different plugin modules for linguistics.

  • link parser
  • wordnet

so the link parser makes it possible to parse a sentence and find the subject, verb and object and other cool stuff. Wordnet is a database a bit like a dictionary and thesaurus – I had never heard of it before but it seems pretty cool. So I decided to get this up and running. It was a bit of a hunt to get everything I needed and for that reason I have detailed my steps to install it on my my OSX Lion mac pro.

Step 1

Download all the required packages:

  • ruby linkparser – wget http://deveiate.org/code/linkparser-1.1.0.gem
  • link grammar – wget http://www.abisource.com/downloads/link-grammar/4.7.4/link-grammar-4.7.4.tar.gz
  • ruby wordnet – wget http://deveiate.org/code/wordnet-0.0.5.gem
  • ruby wordet zip – wget http://deveiate.org/code/wordnet-0.0.5.tar.gz
  • wordnet – wget http://wordnetcode.princeton.edu/3.0/WordNet-3.0.tar.gz
  • ruby linguistics – sudo gem install linguistics
  • berkleydb – download berkleydb from http://www.oracle.com/technetwork/database/berkeleydb/downloads/index.html
  • ruby bdb – https://github.com/knu/ruby-bdb

Step 2

I put all those downloaded packages into a single directory and did the following:

Build wordnet
tar -xvf WordNet-3.0.tar.gz
cd WordNet-3.0
./configure && make && sudo make install
Build berkleydb
tar -xvf db-5.2.36.tar.gz
cd db-5.2.36/build_unix
../dist/configure LDFLAGS='-arch x86_64 -arch i386' CFLAGS='-arch x86_64 -arch i386'
make && sudo make install
Build ruby bdb
git clone https://github.com/knu/ruby-bdb.git
cd ruby-bdb
ruby extconf.rb --with-db-dir=/usr/local/BerkeleyDB.5.2
make
sudo make install
Build Link Grammar
tar xvf link-grammar-4.7.4.tar.gz 
cd link-grammar-4.7.4
./configure && make && sudo make install
Install gems
sudo gem install linguistics
sudo gem install linkparser-1.1.0.gem
sudo gem install wordnet-0.0.5.gem
Convert wordnet database
tar -xvf wordnet-0.0.5.tar.gz
cd wordnet-0.0.5
ruby convertdb.rb 

The convert db script asks a couple of questions which I just hit enter to use the default and it converts the wordnet data files. Finally I needed to copy the ruby-wordnet files from the wordnet-0.0.5 dir to the location the gem expects to find them:

sudo cp -R ruby-wordnet /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/share
sudo chmod -R 755 /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/share/ruby-wordnet

Step 3

Now to test it all works: (refer to the docs here: http://deveiate.org/projects/Linguistics/wiki/English)

irb
>> require 'linguistics'
=> true
>> Linguistics::use( :en ) 
=> [String, Numeric, Array]
>> "box".en.plural
=> "boxes" 
>> Linguistics::EN.has_wordnet?
=> true
>> "balance".en.synset( :verb )
=> #<WordNet::Synset:0x10f441fa0/2673134 balance, equilibrate, equilibrize, equilibrise (verb): "bring into balance or equilibrium; "She has to balance work and her domestic duties"; "balance the two weights"" (verb_groups: 2, hypernyms: 1, hyponyms: 5, derivations: 7, antonyms: 1)>
>> Linguistics::EN.has_link_parser?
=> true
>> "he is a big dog".en.sentence.object.to_s
link-grammar: Info: Dictionary found at /usr/local/share/link-grammar/en/4.0.dict
=> "dog" 

Posted by kingsleyh | Filed in Ruby |

2 Comments

Glad to visit this blog, keep it going.

Highly descriptive article, I enjoyed that bit. Will there be a part 2?

Leave a Comment