org.umber.core.text.splitters
Class RegexpExtractor

java.lang.Object
  extended byorg.umber.core.text.splitters.RegexpExtractor
All Implemented Interfaces:
ITextSplitter

public class RegexpExtractor
extends java.lang.Object
implements ITextSplitter

Text extractor which extracts text from all parenthetical groups defined in the driving regular expression.

Author:
jsheets

Constructor Summary
RegexpExtractor(java.lang.String regexp, boolean isPerLine)
          Creates a new instance of RegexpExtractor.
 
Method Summary
 java.lang.String[] splitText(java.lang.String text)
          Extracts fragments of text from the input text document.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RegexpExtractor

public RegexpExtractor(java.lang.String regexp,
                       boolean isPerLine)
Creates a new instance of RegexpExtractor.

Parameters:
regexp - regular expression
isPerLine - true if the regular expression should be applied to each line, or false if it should be applied once to the entire text
Method Detail

splitText

public java.lang.String[] splitText(java.lang.String text)
Extracts fragments of text from the input text document.

This implementation extracts one fragment of text for each parenthetical regular expression group. If the expression has no groups, this method will return an empty array.

Specified by:
splitText in interface ITextSplitter
Parameters:
text - input text document
Returns:
extracted text fragments