LinkVerify: Class LinkSpider

Overview

Package

Class

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: INNER | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

de.rw7.htmltools
Class LinkSpider

java.lang.Object
  |
  +--de.rw7.htmltools.LinkSpider

public class LinkSpider
extends java.lang.Object

This class actually does the work. It runs over an entire website, or a subdirectory, fishing out all broken links in HTML and CSS code. It also holds the information, which urls have been checked already, and which not.

Checking is done multithreaded. For each thread an instance of the inner class T is created.

Inner Class Summary
`(package private) class`	`LinkSpider.T`

Field Summary
`private static boolean`	`bugEqualsPresent`
`private static boolean`	`bugHostPresent`
`protected LinkFrontend`	`fe` The associated frontend.
`protected HtmlTokenizer`	`ht`
`protected java.util.Hashtable`	`pool` Holds all urls checked up to now.
`protected java.net.URL`	`root` The root url to be checked.
`protected java.lang.String`	`rootDir` Holds the directory of the root url.
`protected java.lang.String`	`rootHost`
`protected int`	`runningThreads` The number of threads not waiting due to lack of work.
`private static int`	`TARGET`
`protected int`	`targetNumber`
`protected java.util.Hashtable`	`targets` Holds all defined target frame names.
`protected LinkSpider.T[]`	`threads` Holds all threads of the spider.
`protected java.util.Vector`	`vakant` Holds all urls not yet checked.
`protected boolean`	`verifyExternals` Tells the spider, to check links, which are not in or under the directory of the root url.

Constructor Summary
`LinkSpider(LinkFrontend fe, java.net.URL root, boolean verifyExternals, int nthreads)` The constructor of the spider.

Method Summary
`(package private) static void`	`<clinit>()`
`private static java.lang.String`	`extractDir(java.lang.String filename)`
`java.util.Hashtable`	`getPool()` For debugging only.
`java.util.Hashtable`	`getTargets()` For debugging only.
`java.util.Enumeration`	`getVakant()` For debugging only.
`private static void`	`setTags(HtmlTokenizer ht)`
`void`	`startVerify()` Starts the spider.
`void`	`stop()` Stops the spider abnormally.
`protected void`	`target(LinkFile lf, java.lang.String target, boolean definition)`
`protected void`	`urlAbsent(LinkFile ut, java.lang.Exception e)`
`protected LinkFile`	`urlLookup(int count)`
`protected java.net.URL`	`urlRequiredGet(java.net.URL u, LinkFile referrer)`

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait

Field Detail

fe

protected LinkFrontend fe

The associated frontend.

root

protected java.net.URL root

The root url to be checked.

verifyExternals

protected boolean verifyExternals

Tells the spider, to check links, which are not in or under the directory of the root url.

pool

protected java.util.Hashtable pool

Holds all urls checked up to now.

vakant

protected java.util.Vector vakant

Holds all urls not yet checked.

targets

protected java.util.Hashtable targets

Holds all defined target frame names. Maps the target name to the corresponding LinkTarget-Object.

See Also:: LinkTarget

rootHost

protected java.lang.String rootHost

rootDir

protected java.lang.String rootDir

Holds the directory of the root url.

ht

protected HtmlTokenizer ht

threads

protected LinkSpider.T[] threads

Holds all threads of the spider.

runningThreads

protected int runningThreads

The number of threads not waiting due to lack of work. If this becomes zero, the spider exits normally.

targetNumber

protected int targetNumber

bugEqualsPresent

private static boolean bugEqualsPresent

bugHostPresent

private static boolean bugHostPresent

TARGET

private static final int TARGET

Constructor Detail

LinkSpider

public LinkSpider(LinkFrontend fe,
                  java.net.URL root,
                  boolean verifyExternals,
                  int nthreads)

The constructor of the spider. Does not start the spider.

Parameters:: lv - the accociated frontend window.; src - the root url.; verifyExternals - whether to check url not in or under the root urls directory; nthreads - the number of threads to be created.

Method Detail

getPool

public final java.util.Hashtable getPool()

For debugging only.

getVakant

public final java.util.Enumeration getVakant()

For debugging only.

getTargets

public final java.util.Hashtable getTargets()

For debugging only.

startVerify

public void startVerify()

Starts the spider. From the moment of calling this method, the associated Frontend has to be able to handle calls to onAbsent etc.

stop

public final void stop()

Stops the spider abnormally. This is called by the frontend on user interaction.

urlRequiredGet

protected final java.net.URL urlRequiredGet(java.net.URL u,
                                            LinkFile referrer)

urlLookup

protected final LinkFile urlLookup(int count)
                            throws java.lang.InterruptedException

urlAbsent

protected final void urlAbsent(LinkFile ut,
                               java.lang.Exception e)

target

protected final void target(LinkFile lf,
                            java.lang.String target,
                            boolean definition)

extractDir

private static final java.lang.String extractDir(java.lang.String filename)

<clinit>

static void <clinit>()

setTags

private static final void setTags(HtmlTokenizer ht)

Overview

Package

Class

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: INNER | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

de.rw7.htmltools Class LinkSpider

fe

root

verifyExternals

pool

vakant

targets

rootHost

rootDir

ht

threads

runningThreads

targetNumber

bugEqualsPresent

bugHostPresent

TARGET

LinkSpider

getPool

getVakant

getTargets

startVerify

stop

urlRequiredGet

urlLookup

urlAbsent

target

extractDir

<clinit>

setTags

de.rw7.htmltools
Class LinkSpider