de.rw7.htmltools
Class LinkSpider

java.lang.Object
  |
  +--de.rw7.htmltools.LinkSpider

public class LinkSpider
extends java.lang.Object

This class actually does the work. It runs over an entire website, or a subdirectory, fishing out all broken links in HTML and CSS code. It also holds the information, which urls have been checked already, and which not.

Checking is done multithreaded. For each thread an instance of the inner class T is created.


Inner Class Summary
(package private)  class LinkSpider.T
           
 
Field Summary
private static boolean bugEqualsPresent
           
private static boolean bugHostPresent
           
protected  LinkFrontend fe
          The associated frontend.
protected  HtmlTokenizer ht
           
protected  java.util.Hashtable pool
          Holds all urls checked up to now.
protected  java.net.URL root
          The root url to be checked.
protected  java.lang.String rootDir
          Holds the directory of the root url.
protected  java.lang.String rootHost
           
protected  int runningThreads
          The number of threads not waiting due to lack of work.
private static int TARGET
           
protected  int targetNumber
           
protected  java.util.Hashtable targets
          Holds all defined target frame names.
protected  LinkSpider.T[] threads
          Holds all threads of the spider.
protected  java.util.Vector vakant
          Holds all urls not yet checked.
protected  boolean verifyExternals
          Tells the spider, to check links, which are not in or under the directory of the root url.
 
Constructor Summary
LinkSpider(LinkFrontend fe, java.net.URL root, boolean verifyExternals, int nthreads)
          The constructor of the spider.
 
Method Summary
(package private) static void <clinit>()
           
private static java.lang.String extractDir(java.lang.String filename)
           
 java.util.Hashtable getPool()
          For debugging only.
 java.util.Hashtable getTargets()
          For debugging only.
 java.util.Enumeration getVakant()
          For debugging only.
private static void setTags(HtmlTokenizer ht)
           
 void startVerify()
          Starts the spider.
 void stop()
          Stops the spider abnormally.
protected  void target(LinkFile lf, java.lang.String target, boolean definition)
           
protected  void urlAbsent(LinkFile ut, java.lang.Exception e)
           
protected  LinkFile urlLookup(int count)
           
protected  java.net.URL urlRequiredGet(java.net.URL u, LinkFile referrer)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, registerNatives, toString, wait, wait, wait
 

Field Detail

fe

protected LinkFrontend fe
The associated frontend.

root

protected java.net.URL root
The root url to be checked.

verifyExternals

protected boolean verifyExternals
Tells the spider, to check links, which are not in or under the directory of the root url.

pool

protected java.util.Hashtable pool
Holds all urls checked up to now.

vakant

protected java.util.Vector vakant
Holds all urls not yet checked.

targets

protected java.util.Hashtable targets
Holds all defined target frame names. Maps the target name to the corresponding LinkTarget-Object.
See Also:
LinkTarget

rootHost

protected java.lang.String rootHost

rootDir

protected java.lang.String rootDir
Holds the directory of the root url.

ht

protected HtmlTokenizer ht

threads

protected LinkSpider.T[] threads
Holds all threads of the spider.

runningThreads

protected int runningThreads
The number of threads not waiting due to lack of work. If this becomes zero, the spider exits normally.

targetNumber

protected int targetNumber

bugEqualsPresent

private static boolean bugEqualsPresent

bugHostPresent

private static boolean bugHostPresent

TARGET

private static final int TARGET
Constructor Detail

LinkSpider

public LinkSpider(LinkFrontend fe,
                  java.net.URL root,
                  boolean verifyExternals,
                  int nthreads)
The constructor of the spider. Does not start the spider.
Parameters:
lv - the accociated frontend window.
src - the root url.
verifyExternals - whether to check url not in or under the root urls directory
nthreads - the number of threads to be created.
Method Detail

getPool

public final java.util.Hashtable getPool()
For debugging only.

getVakant

public final java.util.Enumeration getVakant()
For debugging only.

getTargets

public final java.util.Hashtable getTargets()
For debugging only.

startVerify

public void startVerify()
Starts the spider. From the moment of calling this method, the associated Frontend has to be able to handle calls to onAbsent etc.

stop

public final void stop()
Stops the spider abnormally. This is called by the frontend on user interaction.

urlRequiredGet

protected final java.net.URL urlRequiredGet(java.net.URL u,
                                            LinkFile referrer)

urlLookup

protected final LinkFile urlLookup(int count)
                            throws java.lang.InterruptedException

urlAbsent

protected final void urlAbsent(LinkFile ut,
                               java.lang.Exception e)

target

protected final void target(LinkFile lf,
                            java.lang.String target,
                            boolean definition)

extractDir

private static final java.lang.String extractDir(java.lang.String filename)

<clinit>

static void <clinit>()

setTags

private static final void setTags(HtmlTokenizer ht)