OpenGLized: March 2011

Since yesterday i've been going crazy about getting Weka (Data Mining tool) to work with a stemmer (Snowball). I finally found a solution thanks to another blogger . The process is as follows:

Download the snowball stemmer .jar from Weka website (link).
Put it into your Weka root directory.
Modify your RunWeka.ini file at that directory so it seems like this: cp=%CLASSPATH%;snowball.jar

Now when you use the StringToWordVector filter, you must choose at "stemmer" the SnowballStemmer option. After that click over the text box on the filter UI. A popup should appear, and there you must enter the language you need to use, in my case I used the spanish stemmer, so i put "spanish" without the quotes.

Note that you can also use a stopword list to remove them when processing the text with the filter. To do so, click over the text field at the GUI. A filechooser window should arise letting you choose your stopword list. The format of this file is: one word per line, lines starting with "#" are considered comments. You should take this into account when creating your own stopword list. Anyway, there are some available on the net for you to use.

OpenGLized

Mar 5, 2011

Weka stemming