org.umber.core.text.filters
Class Normalizer

java.lang.Object
  extended byorg.umber.core.text.filters.Normalizer
All Implemented Interfaces:
ITextFilter

public class Normalizer
extends java.lang.Object
implements ITextFilter

Text filter to perform whitespace (or custom) normalization.

Author:
jsheets

Constructor Summary
Normalizer()
          Creates a new instance of whitespace Normalizer.
Normalizer(java.lang.String[] tokens, java.lang.String normalText, java.lang.String[][] exclusionTokens)
          Creates a new instance of Normalizer with custom normalization.
 
Method Summary
 java.lang.String filterText(java.lang.String text)
          Sends input document through a modifying filter.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Normalizer

public Normalizer()
Creates a new instance of whitespace Normalizer. Normalizes all spans of whitespace into single space characters.


Normalizer

public Normalizer(java.lang.String[] tokens,
                  java.lang.String normalText,
                  java.lang.String[][] exclusionTokens)
Creates a new instance of Normalizer with custom normalization. The exclusionTokens parameter must be an array of two-element String[] arrays. Each pair of Strings should contain the start and end tokens delimiters of areas to skip normalization.

If any of the constructor parameters are null, the default normalization will be used for those parameters.

Parameters:
tokens - tokens to collapse and replace with normalText
normalText - text to replace spans of normalize tokens with
exclusionTokens - pairs of delimiters to exclude regions of text from normalization
Method Detail

filterText

public java.lang.String filterText(java.lang.String text)
Sends input document through a modifying filter.

Specified by:
filterText in interface ITextFilter
Parameters:
text - input text document
Returns:
modified version of input document