All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class websphinx.Link

java.lang.Object
   |
   +----websphinx.Region
           |
           +----websphinx.Element
                   |
                   +----websphinx.Link

public class Link
extends Element
implements Prioritized
Link to a Web page.

See Also:
Page

Variable Index

 o GET
Use the HTTP GET method to download this link.
 o POST
Use the HTTP POST method to access this link.
 o url

Constructor Index

 o Link(File)
Make a Link from a File.
 o Link(String)
Make a Link from a string URL.
 o Link(Tag, Tag, URL)
Make a Link from a start tag and end tag and a base URL (for relative references).
 o Link(URL)
Make a Link from a URL.

Method Index

 o discardContent()
Eliminate all references to page content.
 o disconnect()
Disconnect this link from its downloaded page (throwing away the page).
 o FileToURL(File)
Convert a local filename to a URL.
 o getDepth()
Get depth of link in crawl.
 o getDirectory()
Get the directory part of the link, like "/home/dir/".
 o getDirectoryURL()
Get the URL of a page's directory.
 o getDirectoryURL(URL)
Get the URL of a page's directory.
 o getDownloadParameters()
Get the download parameters used for this link.
 o getFile()
Get the information part of the link, like "/home/dir/index.html?query".
 o getFilename()
Get the filename part of the link, like "index.html".
 o getHost()
Get the hostname of the link, like "www.cs.cmu.edu".
 o getMethod()
Get the method used to access this link.
 o getPage()
Get the downloaded page to which the link points.
 o getPageURL()
Get the URL of a page, omitting any anchor reference (like #ref).
 o getPageURL(URL)
Get the URL of a page, omitting any anchor reference (like #ref).
 o getParentURL()
Get the URL of a page's parent directory.
 o getParentURL(URL)
Get the URL of a page's parent directory.
 o getPort()
Get the port number of the link.
 o getPriority()
Get the priority of the link in the crawl.
 o getProtocol()
Get the network protocol of the link, like "ftp" or "http".
 o getQuery()
Get the query part of the link, like "?query".
 o getRef()
Get the anchor reference of the link, like "#ref".
 o getServiceURL()
Get the URL of a Web service, omitting any query or anchor reference.
 o getServiceURL(URL)
Get the URL of a Web service, omitting any query or anchor reference.
 o getStatus()
Get the status of the link.
 o getURL()
Get the URL.
 o relativeTo(URL, String)
 o relativeTo(URL, URL)
 o replaceHref(String)
Copy the link's start tag, replacing the URL.
 o setDownloadParameters(DownloadParameters)
Set the download parameters used for this link.
 o setPage(Page)
Set the page corresponding to this link.
 o setPriority(float)
Set the priority of the link in the crawl.
 o setStatus(int)
Set the status of the link.
 o setText(String)
Set the tagless-text representation of this region.
 o toDescription()
Generate a human-readable description of the link.
 o toText()
Convert the region to tagless text.
 o toURL()
Convert the link's URL to a String
 o toURLDelimiters(String)
 o urlFromHref(Tag, URL)
Construct the URL for a link element, from its start tag and a base URL (for relative references).
 o URLToFile(URL)
Convert a file: URL to a filename appropriate to the current system platform.

Variables

 o url
 protected URL url
 o GET
 public static final int GET
Use the HTTP GET method to download this link.

 o POST
 public static final int POST
Use the HTTP POST method to access this link.

Constructors

 o Link
 public Link(Tag startTag,
             Tag endTag,
             URL base) throws MalformedURLException
Make a Link from a start tag and end tag and a base URL (for relative references). The tags must be on the same page.

Parameters:
startTag - Start tag of element
endTag - End tag of element
base - Base URL used for relative references
 o Link
 public Link(URL url)
Make a Link from a URL.

 o Link
 public Link(File file) throws MalformedURLException
Make a Link from a File.

 o Link
 public Link(String href) throws MalformedURLException
Make a Link from a string URL.

Throws: MalformedURLException
if the URL is invalid

Methods

 o discardContent
 public void discardContent()
Eliminate all references to page content.

 o disconnect
 public void disconnect()
Disconnect this link from its downloaded page (throwing away the page).

 o getDepth
 public int getDepth()
Get depth of link in crawl.

Returns:
depth of link from root (depth of roots is 0)
 o getURL
 public URL getURL()
Get the URL.

Returns:
the URL of the link
 o getProtocol
 public String getProtocol()
Get the network protocol of the link, like "ftp" or "http".

Returns:
the protocol portion of the link's URL
 o getHost
 public String getHost()
Get the hostname of the link, like "www.cs.cmu.edu".

Returns:
the hostname portion of the link's URL
 o getPort
 public int getPort()
Get the port number of the link.

Returns:
the port number of the link's URL, or -1 if no port number is explicitly specified in the URL
 o getFile
 public String getFile()
Get the information part of the link, like "/home/dir/index.html?query". Equivalent to getURL().getFile().

Returns:
the filename portion of the link's URL
 o getDirectory
 public String getDirectory()
Get the directory part of the link, like "/home/dir/". Always starts and ends with '/'.

Returns:
the directory portion of the link's URL
 o getFilename
 public String getFilename()
Get the filename part of the link, like "index.html". Never contains '/'; may be the empty string.

Returns:
the filename portion of the link's URL
 o getQuery
 public String getQuery()
Get the query part of the link, like "?query". Either starts with a '?', or is empty.

Returns:
the query portion of the link's URL
 o getRef
 public String getRef()
Get the anchor reference of the link, like "#ref". Either starts with '#', or is empty.

Returns:
the anchor reference portion of the link's URL
 o getPageURL
 public URL getPageURL()
Get the URL of a page, omitting any anchor reference (like #ref).

Returns:
the URL sans anchor reference
 o getPageURL
 public static URL getPageURL(URL url)
Get the URL of a page, omitting any anchor reference (like #ref).

Returns:
the URL sans anchor reference
 o getServiceURL
 public URL getServiceURL()
Get the URL of a Web service, omitting any query or anchor reference.

Returns:
the URL sans query and anchor reference
 o getServiceURL
 public static URL getServiceURL(URL url)
Get the URL of a Web service, omitting any query or anchor reference.

Returns:
the URL sans query and anchor reference
 o getDirectoryURL
 public URL getDirectoryURL()
Get the URL of a page's directory.

Returns:
the URL sans filename, query and anchor reference
 o getDirectoryURL
 public static URL getDirectoryURL(URL url)
Get the URL of a page's directory.

Returns:
the URL sans filename, query and anchor reference
 o getParentURL
 public URL getParentURL()
Get the URL of a page's parent directory.

Returns:
the URL sans filename, query and anchor reference
 o getParentURL
 public static URL getParentURL(URL url)
Get the URL of a page's parent directory.

Returns:
the URL sans filename, query and anchor reference
 o relativeTo
 public static String relativeTo(URL here,
                                 URL there)
 o relativeTo
 public static String relativeTo(URL here,
                                 String there)
 o FileToURL
 public static URL FileToURL(File file) throws MalformedURLException
Convert a local filename to a URL. For example, if the filename is "C:\FOO\BAR\BAZ", the resulting URL is "file:/C:/FOO/BAR/BAZ".

Parameters:
file - File to convert
Returns:
URL corresponding to file
 o URLToFile
 public static File URLToFile(URL url) throws MalformedURLException
Convert a file: URL to a filename appropriate to the current system platform. For example, on MS Windows, if the URL is "file:/FOO/BAR/BAZ", the resulting filename is "\FOO\BAR\BAZ".

Parameters:
url - URL to convert
Returns:
File corresponding to url
Throws: MalformedURLException
if url is not a file: URL.
 o toURLDelimiters
 public static String toURLDelimiters(String path)
 o getPage
 public Page getPage()
Get the downloaded page to which the link points.

Returns:
the Page object, or null if the page hasn't been downloaded.
 o setPage
 public void setPage(Page page)
Set the page corresponding to this link.

Parameters:
page - Page to which this link points
 o getMethod
 public int getMethod()
Get the method used to access this link.

Returns:
GET or POST.
 o toURL
 public String toURL()
Convert the link's URL to a String

Returns:
the URL represented as a string
 o toDescription
 public String toDescription()
Generate a human-readable description of the link.

Returns:
a description of the link, in the form "[url]".
 o toText
 public String toText()
Convert the region to tagless text.

Returns:
a string consisting of the text in the page contained by this region
Overrides:
toText in class Region
 o setText
 public void setText(String text)
Set the tagless-text representation of this region.

Parameters:
text - a string consisting of the text in the page contained by this region
 o urlFromHref
 protected URL urlFromHref(Tag tag,
                           URL base) throws MalformedURLException
Construct the URL for a link element, from its start tag and a base URL (for relative references).

Parameters:
tag - Start tag of link, such as <A HREF="/foo/index.html">.
base - Base URL used for relative references
Returns:
URL to which the link points
 o replaceHref
 public Tag replaceHref(String newHref)
Copy the link's start tag, replacing the URL. Note that the name of the attribute containing the URL varies from tag to tag: sometimes it is called HREF, sometimes SRC, sometimes CODE, etc. This method changes the appropriate attribute for this tag.

Parameters:
newHref - New URL or relative reference; e.g. "http://www.cs.cmu.edu/" or "/foo/index.html".
Returns:
copy of this link's start tag with its URL attribute replaced. The copy is a region of a fresh page containing only the tag.
 o getStatus
 public int getStatus()
Get the status of the link. Possible values are defined in LinkEvent.

Returns:
last event that happened to this link
 o setStatus
 public void setStatus(int event)
Set the status of the link. Possible values are defined in LinkEvent.

Parameters:
event - the event that just happened to this link
 o getPriority
 public float getPriority()
Get the priority of the link in the crawl.

 o setPriority
 public void setPriority(float priority)
Set the priority of the link in the crawl.

 o getDownloadParameters
 public DownloadParameters getDownloadParameters()
Get the download parameters used for this link. Default is null.

 o setDownloadParameters
 public void setDownloadParameters(DownloadParameters dp)
Set the download parameters used for this link.


All Packages  Class Hierarchy  This Package  Previous  Next  Index