org.umber.bellows.loader
Class HtmlReader

java.lang.Object
  extended byjava.io.Reader
      extended byjava.io.FilterReader
          extended byorg.umber.bellows.loader.HtmlReader

public class HtmlReader
extends java.io.FilterReader

Reader for converting an HTML stream into a Datum object tree. Instantiates one Datum object for every element in the HTML document, and sets Datum properties for each HTML attribute.

Author:
jsheets

Field Summary
static java.lang.String TAGSOUP_READER
          The fully qualified class for the TagSoup HTML Reader
 
Fields inherited from class java.io.FilterReader
in
 
Fields inherited from class java.io.Reader
lock
 
Constructor Summary
HtmlReader(java.io.Reader in)
          Creates a new instance of HtmlReader.
HtmlReader(java.io.Reader in, java.lang.String baseURI)
          Creates a new instance of HtmlReader which uses the TagSoup HTML parser class and the base URI to resolve relative URIs in the HTML document.
 
Method Summary
static Datum fromHtml(java.lang.String html)
          Convenience method to convert an HTML string directly into a Datum tree.
 Datum readHtml()
          Reads an HTML document from the input Reader and converts it into a Datum tree.
 
Methods inherited from class java.io.FilterReader
close, mark, markSupported, read, read, ready, reset, skip
 
Methods inherited from class java.io.Reader
read
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

TAGSOUP_READER

public static final java.lang.String TAGSOUP_READER
The fully qualified class for the TagSoup HTML Reader

See Also:
Constant Field Values
Constructor Detail

HtmlReader

public HtmlReader(java.io.Reader in)
           throws UmberClassException
Creates a new instance of HtmlReader.

Parameters:
in - Reader instance to wrap
Throws:
UmberClassException - if unable to find TagSoup library

HtmlReader

public HtmlReader(java.io.Reader in,
                  java.lang.String baseURI)
           throws UmberClassException
Creates a new instance of HtmlReader which uses the TagSoup HTML parser class and the base URI to resolve relative URIs in the HTML document.

Parameters:
in - Reader instance to wrap
baseURI - the base path for resolving relative URIs
Throws:
UmberClassException - if unable to find TagSoup library
Method Detail

readHtml

public Datum readHtml()
               throws UmberClassException,
                      BellowsIOException,
                      BellowsParseException
Reads an HTML document from the input Reader and converts it into a Datum tree.

Returns:
a Datum tree
Throws:
UmberClassException - if unable to find any XML parsers
BellowsParseException - if XML parsing errors occur
BellowsIOException - if I/O errors occur

fromHtml

public static Datum fromHtml(java.lang.String html)
                      throws UmberClassException,
                             BellowsIOException,
                             BellowsParseException
Convenience method to convert an HTML string directly into a Datum tree.

Parameters:
html - the input HTML in String form
Returns:
the HTML Datum tree, or null if errors occurred
Throws:
UmberClassException - if unable to find TagSoup library
BellowsParseException - if XML parsing errors occur
BellowsIOException - if I/O errors occur