Natural Language Processing in Ruby

Natural Language Processing in Ruby

There are some things at Ruby Linguistics and some links therefrom, though it doesn't seem anywhere close to what NLTK is for Python, yet.

What is the best way to do Natural Language Processing in Rails app?

I had the same problem a few months ago. After a bit of research and testing this is the solution I implemented

Run several python processes As many as one machine can hold. And use as many machines as you need.

Use ZeroMQ to communicate between the web servers and the machines running python processes

Do not use HTTP to communicate because the overhead is significant and it will be very slow compared to ZeroMQ. You will also not need an as complex handler with ZeroMQ as you would with HTTP

Take care to expose ZeroMQ sockets to internal networks only, or you would need to set up authentication on each python server

Another option is to just use one of the many available NLP APIs, if don't need any corpus based algorithms (such as POS tagging, Sentiment Analysis, etc).

Business Natural Language for Ruby beginners

If you really want to do natural language processing (you mention it in the text), I'd advise using OpenNLP with JRuby. I did that last year and it worked out pretty well.

For DSLs, there are a ton of Ruby-specific tutorials on the web, just use your favorite search engine to find them. Book wise I'd recommend Russ Olson's "Eloquent Ruby" and Paola Perotta's "Metaprogramming Ruby". After those 2 books you should know everything you need to know about writing DSLs and have learned a lot of very good Ruby as a side-effect.

Detecting Elements of Sentence In Ruby

These are the only natural language processing options for Ruby that I know of.

  • Treat
  • Stanford Core NLP
  • Open NLP

Interestingly, they are all by the same person.

EDIT
Here is one more option that I found. It's a tutorial on n-gram analysis.

Natural Language Processing with Ruby: n-grams

Rubygem: ruby gem to process language

They key search phrase you need is Natural Language Processing, or NLP. Here are some older SO questions on the topic:

https://stackoverflow.com/questions/3776361/ruby-nlp-libraries

Natural Language Processing in Ruby

Ruby/Rails - Convert Date/Time back into Natural Language (2011-02-17 = February 17th, 2011)

Take a look at strftime, this is a link to it's implementation for the Time class.

With it you should be able to get the date to do anything you want :-)

>> Time.now.strftime("%A, %B %d, %Y at %l%p")
=> "Thursday, February 17, 2011 at 5PM"

Techniques for categorising natural language strings?

As to Python, for the moment I can recommend looking into:

http://www.nltk.org/

It has good documentation, and lots of lots of functionality in the field of natural language processing. Also there is a package in the Ubuntu repository (python-nltk), so it's easy to install and experiment with.

For most situations you'll need access to a good quality corpus.

Is there a good natural language processing library

LingPipe is very nice and well documented. You can also take a look at:

  • OpenNLP
  • Stanford NLP
  • Apache UIMA
  • GATE
  • CogComp-NLP
  • FrameNet

The last one specifically might be of interest to you, although I don't know whether there are any readily available Java implementations (and maybe that's too big of a gun for your problem anyway :-)

Paul's idea of using a DSL is probably easier and faster to implement, and more reliable to use for your customers. I, too, would recommend looking into that first.

How to use Stanford CoreNLP java library with Ruby for sentiment analysis?

As suggested in the comments by @Qualtagh, I decided to use JRuby.

I first attempted to use Java to use MongoDB as the interface (read directly from MongoDB, analyze with Java / CoreNLP and write back to MongoDB), but the MongoDB Java Driver was more complex to use than the Mongoid ORM I use with Ruby, so this is why I felt JRuby was more appropriate.

Doing a REST service for Java would have required me first to learn how to do a REST service in Java, which might have been easy, or then not. I didn't want to spend time figuring that out.

So the code I needed to do to run my code was:

  def analyze_tweet_with_corenlp_jruby
require 'java'
require 'vendor/CoreNLPTest2.jar' # I made this Java JAR with IntelliJ IDEA that includes both CoreNLP and my initialization class

analyzer = com.me.Analyzer.new # this is the Java class I made for running the CoreNLP analysis, it initializes the CoreNLP with the correct annotations etc.
result = analyzer.analyzeTweet(self.text) # self.text is where the text-to-be-analyzed resides

self.corenlp_sentiment = result # adds the result into this field in the MongoDB model
self.save!
return "#{result}: #{self.text}" # for debugging purposes
end


Related Topics



Leave a reply



Submit