All Packages Class Hierarchy This Package Previous Next Index
Class websphinx.Link
java.lang.Object
|
+----websphinx.Region
|
+----websphinx.Element
|
+----websphinx.Link
- public class Link
- extends Element
- implements Prioritized
Link to a Web page.
- See Also:
- Page
-
GET
- Use the HTTP GET method to download this link.
-
POST
- Use the HTTP POST method to access this link.
-
url
-
-
Link(File)
- Make a Link from a File.
-
Link(String)
- Make a Link from a string URL.
-
Link(Tag, Tag, URL)
- Make a Link from a start tag and end tag and a base URL (for relative references).
-
Link(URL)
- Make a Link from a URL.
-
discardContent()
- Eliminate all references to page content.
-
disconnect()
- Disconnect this link from its downloaded page (throwing away the page).
-
FileToURL(File)
- Convert a local filename to a URL.
-
getDepth()
- Get depth of link in crawl.
-
getDirectory()
- Get the directory part of the link, like "/home/dir/".
-
getDirectoryURL()
- Get the URL of a page's directory.
-
getDirectoryURL(URL)
- Get the URL of a page's directory.
-
getDownloadParameters()
- Get the download parameters used for this link.
-
getFile()
- Get the information part of the link, like
"/home/dir/index.html?query".
-
getFilename()
- Get the filename part of the link, like "index.html".
-
getHost()
- Get the hostname of the link, like "www.cs.cmu.edu".
-
getMethod()
- Get the method used to access this link.
-
getPage()
- Get the downloaded page to which the link points.
-
getPageURL()
- Get the URL of a page, omitting any anchor reference (like #ref).
-
getPageURL(URL)
- Get the URL of a page, omitting any anchor reference (like #ref).
-
getParentURL()
- Get the URL of a page's parent directory.
-
getParentURL(URL)
- Get the URL of a page's parent directory.
-
getPort()
- Get the port number of the link.
-
getPriority()
- Get the priority of the link in the crawl.
-
getProtocol()
- Get the network protocol of the link, like "ftp" or "http".
-
getQuery()
- Get the query part of the link, like "?query".
-
getRef()
- Get the anchor reference of the link, like "#ref".
-
getServiceURL()
- Get the URL of a Web service, omitting any query or anchor reference.
-
getServiceURL(URL)
- Get the URL of a Web service, omitting any query or anchor reference.
-
getStatus()
- Get the status of the link.
-
getURL()
- Get the URL.
-
relativeTo(URL, String)
-
-
relativeTo(URL, URL)
-
-
replaceHref(String)
- Copy the link's start tag, replacing the URL.
-
setDownloadParameters(DownloadParameters)
- Set the download parameters used for this link.
-
setPage(Page)
- Set the page corresponding to this link.
-
setPriority(float)
- Set the priority of the link in the crawl.
-
setStatus(int)
- Set the status of the link.
-
setText(String)
- Set the tagless-text representation of this region.
-
toDescription()
- Generate a human-readable description of the link.
-
toText()
- Convert the region to tagless text.
-
toURL()
- Convert the link's URL to a String
-
toURLDelimiters(String)
-
-
urlFromHref(Tag, URL)
- Construct the URL for a link element, from its start tag and a base URL (for relative references).
-
URLToFile(URL)
- Convert a file: URL to a filename appropriate to the
current system platform.
url
protected URL url
GET
public static final int GET
- Use the HTTP GET method to download this link.
POST
public static final int POST
- Use the HTTP POST method to access this link.
Link
public Link(Tag startTag,
Tag endTag,
URL base) throws MalformedURLException
- Make a Link from a start tag and end tag and a base URL (for relative references).
The tags must be on the same page.
- Parameters:
- startTag - Start tag of element
- endTag - End tag of element
- base - Base URL used for relative references
Link
public Link(URL url)
- Make a Link from a URL.
Link
public Link(File file) throws MalformedURLException
- Make a Link from a File.
Link
public Link(String href) throws MalformedURLException
- Make a Link from a string URL.
- Throws: MalformedURLException
- if the URL is invalid
discardContent
public void discardContent()
- Eliminate all references to page content.
disconnect
public void disconnect()
- Disconnect this link from its downloaded page (throwing away the page).
getDepth
public int getDepth()
- Get depth of link in crawl.
- Returns:
- depth of link from root (depth of roots is 0)
getURL
public URL getURL()
- Get the URL.
- Returns:
- the URL of the link
getProtocol
public String getProtocol()
- Get the network protocol of the link, like "ftp" or "http".
- Returns:
- the protocol portion of the link's URL
getHost
public String getHost()
- Get the hostname of the link, like "www.cs.cmu.edu".
- Returns:
- the hostname portion of the link's URL
getPort
public int getPort()
- Get the port number of the link.
- Returns:
- the port number of the link's URL, or -1 if no port number
is explicitly specified in the URL
getFile
public String getFile()
- Get the information part of the link, like
"/home/dir/index.html?query". Equivalent to getURL().getFile().
- Returns:
- the filename portion of the link's URL
getDirectory
public String getDirectory()
- Get the directory part of the link, like "/home/dir/".
Always starts and ends with '/'.
- Returns:
- the directory portion of the link's URL
getFilename
public String getFilename()
- Get the filename part of the link, like "index.html".
Never contains '/'; may be the empty string.
- Returns:
- the filename portion of the link's URL
getQuery
public String getQuery()
- Get the query part of the link, like "?query".
Either starts with a '?', or is empty.
- Returns:
- the query portion of the link's URL
getRef
public String getRef()
- Get the anchor reference of the link, like "#ref".
Either starts with '#', or is empty.
- Returns:
- the anchor reference portion of the link's URL
getPageURL
public URL getPageURL()
- Get the URL of a page, omitting any anchor reference (like #ref).
- Returns:
- the URL sans anchor reference
getPageURL
public static URL getPageURL(URL url)
- Get the URL of a page, omitting any anchor reference (like #ref).
- Returns:
- the URL sans anchor reference
getServiceURL
public URL getServiceURL()
- Get the URL of a Web service, omitting any query or anchor reference.
- Returns:
- the URL sans query and anchor reference
getServiceURL
public static URL getServiceURL(URL url)
- Get the URL of a Web service, omitting any query or anchor reference.
- Returns:
- the URL sans query and anchor reference
getDirectoryURL
public URL getDirectoryURL()
- Get the URL of a page's directory.
- Returns:
- the URL sans filename, query and anchor reference
getDirectoryURL
public static URL getDirectoryURL(URL url)
- Get the URL of a page's directory.
- Returns:
- the URL sans filename, query and anchor reference
getParentURL
public URL getParentURL()
- Get the URL of a page's parent directory.
- Returns:
- the URL sans filename, query and anchor reference
getParentURL
public static URL getParentURL(URL url)
- Get the URL of a page's parent directory.
- Returns:
- the URL sans filename, query and anchor reference
relativeTo
public static String relativeTo(URL here,
URL there)
relativeTo
public static String relativeTo(URL here,
String there)
FileToURL
public static URL FileToURL(File file) throws MalformedURLException
- Convert a local filename to a URL.
For example, if the filename is "C:\FOO\BAR\BAZ",
the resulting URL is "file:/C:/FOO/BAR/BAZ".
- Parameters:
- file - File to convert
- Returns:
- URL corresponding to file
URLToFile
public static File URLToFile(URL url) throws MalformedURLException
- Convert a file: URL to a filename appropriate to the
current system platform. For example, on MS Windows,
if the URL is "file:/FOO/BAR/BAZ", the resulting
filename is "\FOO\BAR\BAZ".
- Parameters:
- url - URL to convert
- Returns:
- File corresponding to url
- Throws: MalformedURLException
- if url is not a
file: URL.
toURLDelimiters
public static String toURLDelimiters(String path)
getPage
public Page getPage()
- Get the downloaded page to which the link points.
- Returns:
- the Page object, or null if the page hasn't been downloaded.
setPage
public void setPage(Page page)
- Set the page corresponding to this link.
- Parameters:
- page - Page to which this link points
getMethod
public int getMethod()
- Get the method used to access this link.
- Returns:
- GET or POST.
toURL
public String toURL()
- Convert the link's URL to a String
- Returns:
- the URL represented as a string
toDescription
public String toDescription()
- Generate a human-readable description of the link.
- Returns:
- a description of the link, in the form "[url]".
toText
public String toText()
- Convert the region to tagless text.
- Returns:
- a string consisting of the text in the page contained by this region
- Overrides:
- toText in class Region
setText
public void setText(String text)
- Set the tagless-text representation of this region.
- Parameters:
- text - a string consisting of the text in the page contained by this region
urlFromHref
protected URL urlFromHref(Tag tag,
URL base) throws MalformedURLException
- Construct the URL for a link element, from its start tag and a base URL (for relative references).
- Parameters:
- tag - Start tag of link, such as <A HREF="/foo/index.html">.
- base - Base URL used for relative references
- Returns:
- URL to which the link points
replaceHref
public Tag replaceHref(String newHref)
- Copy the link's start tag, replacing the URL. Note that the name of the attribute containing the URL
varies from tag to tag: sometimes it is called HREF, sometimes SRC, sometimes CODE, etc.
This method changes the appropriate attribute for this tag.
- Parameters:
- newHref - New URL or relative reference; e.g. "http://www.cs.cmu.edu/" or "/foo/index.html".
- Returns:
- copy of this link's start tag with its URL attribute replaced. The copy is
a region of a fresh page containing only the tag.
getStatus
public int getStatus()
- Get the status of the link. Possible values are defined in LinkEvent.
- Returns:
- last event that happened to this link
setStatus
public void setStatus(int event)
- Set the status of the link. Possible values are defined in LinkEvent.
- Parameters:
- event - the event that just happened to this link
getPriority
public float getPriority()
- Get the priority of the link in the crawl.
setPriority
public void setPriority(float priority)
- Set the priority of the link in the crawl.
getDownloadParameters
public DownloadParameters getDownloadParameters()
- Get the download parameters used for this link. Default is null.
setDownloadParameters
public void setDownloadParameters(DownloadParameters dp)
- Set the download parameters used for this link.
All Packages Class Hierarchy This Package Previous Next Index