Despite not signing up for a Rails Rumble team this year, I nevertheless followed the results closely. One project in which I took an early interest was Jeremy McAnally’s project, tldr.it. Having always been fascinated by machine parsing of human language, the technology that powered it, Open Text Summarizer, was a real draw for me. Reading through the source code, I realized that this would be a perfect opportunity to combine my resurrected C knowledge with Ruby.
There’s a lot going on under the hood but a quick peek shows us that the library first loads up a stemming dictionary based on your language of choice. Parsing a document based on the loaded stem rules creates an
OtsArticle, a pre-defined
struct which keeps track of a document’s statistics such as term frequency and word scores. The parsed result is then fed into a highlighter which returns only a portion of the text based on a passed in ratio, an integer between zero and 100.
The source is on github and installation is a breeze provided you are on a POSIX-compliant system with glib-2.0 and libxml-2.0 installed and properly configured.
gem install summarize
For the sake of convenience, I’ve made the
summarize method available as a public instance method on both String and File.
Soon this gem will replace some of the more complex inner workings of tldr.it. Feel free to contact me with any feedback you might have.