All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class websphinx.HTMLParser

java.lang.Object
   |
   +----websphinx.HTMLParser

public class HTMLParser
extends Object
HTML parser. Parses an input stream or String and converts it to a sequence of Tags and a tree of Elements. HTMLParser is used by Page to parse pages.


Constructor Index

 o HTMLParser()
Make an HTMLParser.
 o HTMLParser(DownloadParameters)
Make an HTMLParser which retrieves pages using the specified download parameters.

Method Index

 o dontParse(Page, InputStream)
Download an input stream without parsing it.
 o dontParse(Page, Reader)
Download an input stream without parsing it.
 o main(String[])
 o parse(Page, InputStream)
Parse an input stream.
 o parse(Page, Reader)
Parse an input stream.
 o parse(Page, String)
Parse a string.

Constructors

 o HTMLParser
 public HTMLParser()
Make an HTMLParser.

 o HTMLParser
 public HTMLParser(DownloadParameters dp)
Make an HTMLParser which retrieves pages using the specified download parameters. Pages larger than dp.getMaxPageSize() are rejected by parse() with an IOException.

Parameters:
dp - download parameters used during parsing

Methods

 o parse
 public void parse(Page page,
                   InputStream stream) throws IOException
Parse an input stream.

Parameters:
page - Page to receive parsed HTML
input - stream containing HTML
 o parse
 public void parse(Page page,
                   Reader stream) throws IOException
Parse an input stream.

Parameters:
page - Page to receive parsed HTML
input - stream containing HTML
 o parse
 public void parse(Page page,
                   String content) throws IOException
Parse a string.

Parameters:
page - Page to receive parsed HTML
content - String containing HTML
 o dontParse
 public void dontParse(Page page,
                       InputStream stream) throws IOException
Download an input stream without parsing it.

Parameters:
page - Page to receive the downloaded content
input - stream containing content
 o dontParse
 public void dontParse(Page page,
                       Reader stream) throws IOException
Download an input stream without parsing it.

Parameters:
page - Page to receive the downloaded content
r - stream containing content
 o main
 public static void main(String args[]) throws Exception

All Packages  Class Hierarchy  This Package  Previous  Next  Index