org.umber.bellows.loader
Class HtmlReader
java.lang.Object
java.io.Reader
java.io.FilterReader
org.umber.bellows.loader.HtmlReader
- public class HtmlReader
- extends java.io.FilterReader
Reader for converting an HTML stream into a Datum object tree.
Instantiates one Datum object for every element in the HTML
document, and sets Datum properties for each HTML attribute.
- Author:
- jsheets
Field Summary |
static java.lang.String |
TAGSOUP_READER
The fully qualified class for the TagSoup HTML Reader |
Fields inherited from class java.io.FilterReader |
in |
Fields inherited from class java.io.Reader |
lock |
Constructor Summary |
HtmlReader(java.io.Reader in)
Creates a new instance of HtmlReader. |
HtmlReader(java.io.Reader in,
java.lang.String baseURI)
Creates a new instance of HtmlReader which uses the TagSoup HTML
parser class and the base URI to resolve relative URIs in the
HTML document. |
Method Summary |
static Datum |
fromHtml(java.lang.String html)
Convenience method to convert an HTML string directly into a Datum
tree. |
Datum |
readHtml()
Reads an HTML document from the input Reader and converts it into
a Datum tree. |
Methods inherited from class java.io.FilterReader |
close, mark, markSupported, read, read, ready, reset, skip |
Methods inherited from class java.io.Reader |
read |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
TAGSOUP_READER
public static final java.lang.String TAGSOUP_READER
- The fully qualified class for the TagSoup HTML Reader
- See Also:
- Constant Field Values
HtmlReader
public HtmlReader(java.io.Reader in)
throws UmberClassException
- Creates a new instance of HtmlReader.
- Parameters:
in
- Reader instance to wrap
- Throws:
UmberClassException
- if unable to find TagSoup library
HtmlReader
public HtmlReader(java.io.Reader in,
java.lang.String baseURI)
throws UmberClassException
- Creates a new instance of HtmlReader which uses the TagSoup HTML
parser class and the base URI to resolve relative URIs in the
HTML document.
- Parameters:
in
- Reader instance to wrapbaseURI
- the base path for resolving relative URIs
- Throws:
UmberClassException
- if unable to find TagSoup library
readHtml
public Datum readHtml()
throws UmberClassException,
BellowsIOException,
BellowsParseException
- Reads an HTML document from the input Reader and converts it into
a Datum tree.
- Returns:
- a Datum tree
- Throws:
UmberClassException
- if unable to find any XML parsers
BellowsParseException
- if XML parsing errors occur
BellowsIOException
- if I/O errors occur
fromHtml
public static Datum fromHtml(java.lang.String html)
throws UmberClassException,
BellowsIOException,
BellowsParseException
- Convenience method to convert an HTML string directly into a Datum
tree.
- Parameters:
html
- the input HTML in String form
- Returns:
- the HTML Datum tree, or null if errors occurred
- Throws:
UmberClassException
- if unable to find TagSoup library
BellowsParseException
- if XML parsing errors occur
BellowsIOException
- if I/O errors occur