Pages

Tuesday, December 14, 2010

A regular expression to find "word A near word B" in RapidMiner

You can use the Text Processing->Extract Information operator to match regular expressions.

If you put the Extract Information operator inside a Process Documents operator, it will add a column to your dataset with the results of the match. Turn on "add meta information" option on the Process Documents operator.

Here's a simple regular expression to find a word near another word:

(word1\W+(?:\w+\W+){1,max}?word2)

this will produce a match if "word1" has no more than "max" words between it and "word2". Example:

"The quick brown fox jumped over the lazy dog"

(quick\W+(?:\w+\W+){1,5}?lazy) will match, but

(quick\W+(?:\w+\W+){1,5}?dog) will not (it's has 6 words in between)

No comments:

Post a Comment