All Packages Class Hierarchy This Package Previous Next Index
Class websphinx.DownloadParameters
java.lang.Object
|
+----websphinx.DownloadParameters
- public class DownloadParameters
- extends Object
- implements Cloneable, Serializable
Download parameters. These parameters are limits on
how Page can download a Link. A Crawler has a
default set of download parameters, but the defaults
can be overridden on individual links by calling
Link.setDownloadParameters().
DownloadParameters is an immutable class (like String).
"Changing" a parameter actually returns a new instance
of the class with only the specified parameter changed.
-
DownloadParameters()
- Make a DownloadParameters object with default settigns.
-
changeAcceptedMIMETypes(String)
- Change accepted MIME types.
-
changeCrawlTimeout(int)
- Change timeout value.
-
changeDownloadTimeout(int)
- Change download timeout value.
-
changeInteractive(boolean)
- Change interactive flag.
-
changeMaxPageSize(int)
- Change maximum page size.
-
changeMaxThreads(int)
- Set maximum threads.
-
changeObeyRobotExclusion(boolean)
- Change obey-robot-exclusion flag.
-
changeUseCaches(boolean)
- Change use-caches flag.
-
changeUserAgent(String)
- Change User-agent field used in HTTP requests.
-
clone()
- Clone a DownloadParameters object.
-
getAcceptedMIMETypes()
- Get accepted MIME types.
-
getCrawlTimeout()
- Get timeout on entire crawl.
-
getDownloadTimeout()
- Get download timeout value.
-
getInteractive()
- Get interactive flag.
-
getMaxPageSize()
- Get maximum page size.
-
getMaxThreads()
- Get maximum threads.
-
getObeyRobotExclusion()
- Get obey-robot-exclusion flag.
-
getUseCaches()
- Get use-caches flag.
-
getUserAgent()
- Get User-agent header used in HTTP requests.
DownloadParameters
public DownloadParameters()
- Make a DownloadParameters object with default settigns.
clone
public Object clone()
- Clone a DownloadParameters object.
- Overrides:
- clone in class Object
getMaxThreads
public int getMaxThreads()
- Get maximum threads.
- Returns:
- maximum number of background threads used by crawler.
Default is 4.
changeMaxThreads
public DownloadParameters changeMaxThreads(int maxthreads)
- Set maximum threads.
- Parameters:
- maxthreads - maximum number of background threads used by crawler
- Returns:
- new DownloadParameters object with the specified parameter changed.
getMaxPageSize
public int getMaxPageSize()
- Get maximum page size. Pages larger than this limit are neither
downloaded nor parsed.
Default value is 100 (KB).
- Returns:
- maximum page size in kilobytes
changeMaxPageSize
public DownloadParameters changeMaxPageSize(int maxPageSize)
- Change maximum page size. Pages larger than this limit are treated as
leaves in the crawl graph -- neither downloaded nor parsed.
- Parameters:
- maxPageSize - maximum page size in kilobytes
- Returns:
- new DownloadParameters object with the specified parameter changed.
getDownloadTimeout
public int getDownloadTimeout()
- Get download timeout value.
- Returns:
- length of time (in seconds) that crawler will wait for a page to download
before aborting it.
timeout. Default is 60 seconds.
changeDownloadTimeout
public DownloadParameters changeDownloadTimeout(int timeout)
- Change download timeout value.
- Parameters:
- timeout - length of time (in seconds) to wait for a page to download
Use a negative value to turn off timeout.
- Returns:
- new DownloadParameters object with the specified parameter changed.
getCrawlTimeout
public int getCrawlTimeout()
- Get timeout on entire crawl.
- Returns:
- maximum length of time (in seconds) that crawler will run
before aborting. Default is -1 (no limit).
changeCrawlTimeout
public DownloadParameters changeCrawlTimeout(int timeout)
- Change timeout value.
- Parameters:
- timeout - maximum length of time (in seconds) that crawler will run.
Use a negative value to turn off timeout.
- Returns:
- new DownloadParameters object with the specified parameter changed.
getObeyRobotExclusion
public boolean getObeyRobotExclusion()
- Get obey-robot-exclusion flag.
- Returns:
- true iff the
crawler checks robots.txt on the remote Web site
before downloading a page. Default is false.
changeObeyRobotExclusion
public DownloadParameters changeObeyRobotExclusion(boolean f)
- Change obey-robot-exclusion flag.
- Parameters:
- f - If true, then the
crawler checks robots.txt on the remote Web site
before downloading a page.
- Returns:
- new DownloadParameters object with the specified parameter changed.
getInteractive
public boolean getInteractive()
- Get interactive flag.
- Returns:
- true if a user is available to respond to
dialog boxes (for instance, to enter passwords for
authentication). Default is true.
changeInteractive
public DownloadParameters changeInteractive(boolean f)
- Change interactive flag.
- Parameters:
- f - true if a user is available to respond
to dialog boxes
- Returns:
- new DownloadParameters object with the specified parameter changed.
getUseCaches
public boolean getUseCaches()
- Get use-caches flag.
- Returns:
- true if cached pages should be used whenever
possible
changeUseCaches
public DownloadParameters changeUseCaches(boolean f)
- Change use-caches flag.
- Parameters:
- f - true if cached pages should be used whenever possible
- Returns:
- new DownloadParameters object with the specified parameter changed.
getAcceptedMIMETypes
public String getAcceptedMIMETypes()
- Get accepted MIME types.
- Returns:
- list of MIME types that can be handled by
the crawler (which are passed as the Accept header
in the HTTP request).
Default is null.
changeAcceptedMIMETypes
public DownloadParameters changeAcceptedMIMETypes(String types)
- Change accepted MIME types.
- Parameters:
- types - list of MIME types that can be handled
by the crawler. Use null if the crawler can handle anything.
- Returns:
- new DownloadParameters object with the specified parameter changed.
getUserAgent
public String getUserAgent()
- Get User-agent header used in HTTP requests.
- Returns:
- user-agent field used in HTTP requests,
or null if the Java library's default user-agent
is used. Default value is null (but for a Crawler,
the default DownloadParameters has the Crawler's
name as its default user-agent).
changeUserAgent
public DownloadParameters changeUserAgent(String userAgent)
- Change User-agent field used in HTTP requests.
- Parameters:
- userAgent - user-agent field used in HTTP
requests. Pass null to use the Java library's default
user-agent field.
- Returns:
- new DownloadParameters object with the specified parameter changed.
All Packages Class Hierarchy This Package Previous Next Index