All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class websphinx.DownloadParameters

java.lang.Object
   |
   +----websphinx.DownloadParameters

public class DownloadParameters
extends Object
implements Cloneable, Serializable
Download parameters. These parameters are limits on how Page can download a Link. A Crawler has a default set of download parameters, but the defaults can be overridden on individual links by calling Link.setDownloadParameters().

DownloadParameters is an immutable class (like String). "Changing" a parameter actually returns a new instance of the class with only the specified parameter changed.


Constructor Index

 o DownloadParameters()
Make a DownloadParameters object with default settigns.

Method Index

 o changeAcceptedMIMETypes(String)
Change accepted MIME types.
 o changeCrawlTimeout(int)
Change timeout value.
 o changeDownloadTimeout(int)
Change download timeout value.
 o changeInteractive(boolean)
Change interactive flag.
 o changeMaxPageSize(int)
Change maximum page size.
 o changeMaxThreads(int)
Set maximum threads.
 o changeObeyRobotExclusion(boolean)
Change obey-robot-exclusion flag.
 o changeUseCaches(boolean)
Change use-caches flag.
 o changeUserAgent(String)
Change User-agent field used in HTTP requests.
 o clone()
Clone a DownloadParameters object.
 o getAcceptedMIMETypes()
Get accepted MIME types.
 o getCrawlTimeout()
Get timeout on entire crawl.
 o getDownloadTimeout()
Get download timeout value.
 o getInteractive()
Get interactive flag.
 o getMaxPageSize()
Get maximum page size.
 o getMaxThreads()
Get maximum threads.
 o getObeyRobotExclusion()
Get obey-robot-exclusion flag.
 o getUseCaches()
Get use-caches flag.
 o getUserAgent()
Get User-agent header used in HTTP requests.

Constructors

 o DownloadParameters
 public DownloadParameters()
Make a DownloadParameters object with default settigns.

Methods

 o clone
 public Object clone()
Clone a DownloadParameters object.

Overrides:
clone in class Object
 o getMaxThreads
 public int getMaxThreads()
Get maximum threads.

Returns:
maximum number of background threads used by crawler. Default is 4.
 o changeMaxThreads
 public DownloadParameters changeMaxThreads(int maxthreads)
Set maximum threads.

Parameters:
maxthreads - maximum number of background threads used by crawler
Returns:
new DownloadParameters object with the specified parameter changed.
 o getMaxPageSize
 public int getMaxPageSize()
Get maximum page size. Pages larger than this limit are neither downloaded nor parsed. Default value is 100 (KB).

Returns:
maximum page size in kilobytes
 o changeMaxPageSize
 public DownloadParameters changeMaxPageSize(int maxPageSize)
Change maximum page size. Pages larger than this limit are treated as leaves in the crawl graph -- neither downloaded nor parsed.

Parameters:
maxPageSize - maximum page size in kilobytes
Returns:
new DownloadParameters object with the specified parameter changed.
 o getDownloadTimeout
 public int getDownloadTimeout()
Get download timeout value.

Returns:
length of time (in seconds) that crawler will wait for a page to download before aborting it. timeout. Default is 60 seconds.
 o changeDownloadTimeout
 public DownloadParameters changeDownloadTimeout(int timeout)
Change download timeout value.

Parameters:
timeout - length of time (in seconds) to wait for a page to download Use a negative value to turn off timeout.
Returns:
new DownloadParameters object with the specified parameter changed.
 o getCrawlTimeout
 public int getCrawlTimeout()
Get timeout on entire crawl.

Returns:
maximum length of time (in seconds) that crawler will run before aborting. Default is -1 (no limit).
 o changeCrawlTimeout
 public DownloadParameters changeCrawlTimeout(int timeout)
Change timeout value.

Parameters:
timeout - maximum length of time (in seconds) that crawler will run. Use a negative value to turn off timeout.
Returns:
new DownloadParameters object with the specified parameter changed.
 o getObeyRobotExclusion
 public boolean getObeyRobotExclusion()
Get obey-robot-exclusion flag.

Returns:
true iff the crawler checks robots.txt on the remote Web site before downloading a page. Default is false.
 o changeObeyRobotExclusion
 public DownloadParameters changeObeyRobotExclusion(boolean f)
Change obey-robot-exclusion flag.

Parameters:
f - If true, then the crawler checks robots.txt on the remote Web site before downloading a page.
Returns:
new DownloadParameters object with the specified parameter changed.
 o getInteractive
 public boolean getInteractive()
Get interactive flag.

Returns:
true if a user is available to respond to dialog boxes (for instance, to enter passwords for authentication). Default is true.
 o changeInteractive
 public DownloadParameters changeInteractive(boolean f)
Change interactive flag.

Parameters:
f - true if a user is available to respond to dialog boxes
Returns:
new DownloadParameters object with the specified parameter changed.
 o getUseCaches
 public boolean getUseCaches()
Get use-caches flag.

Returns:
true if cached pages should be used whenever possible
 o changeUseCaches
 public DownloadParameters changeUseCaches(boolean f)
Change use-caches flag.

Parameters:
f - true if cached pages should be used whenever possible
Returns:
new DownloadParameters object with the specified parameter changed.
 o getAcceptedMIMETypes
 public String getAcceptedMIMETypes()
Get accepted MIME types.

Returns:
list of MIME types that can be handled by the crawler (which are passed as the Accept header in the HTTP request). Default is null.
 o changeAcceptedMIMETypes
 public DownloadParameters changeAcceptedMIMETypes(String types)
Change accepted MIME types.

Parameters:
types - list of MIME types that can be handled by the crawler. Use null if the crawler can handle anything.
Returns:
new DownloadParameters object with the specified parameter changed.
 o getUserAgent
 public String getUserAgent()
Get User-agent header used in HTTP requests.

Returns:
user-agent field used in HTTP requests, or null if the Java library's default user-agent is used. Default value is null (but for a Crawler, the default DownloadParameters has the Crawler's name as its default user-agent).
 o changeUserAgent
 public DownloadParameters changeUserAgent(String userAgent)
Change User-agent field used in HTTP requests.

Parameters:
userAgent - user-agent field used in HTTP requests. Pass null to use the Java library's default user-agent field.
Returns:
new DownloadParameters object with the specified parameter changed.

All Packages  Class Hierarchy  This Package  Previous  Next  Index