This document describes the program in detail. There is also a quick usage description.
Normally you will call LinkVerify without any parameters. They may be useful in shell scripts, or if you have an text console only.
java | wiebicke.htmltools.LinkVerify [-x] [-c] [url] | |
url | The root url to be checked. | |
-x | Do not check external links. | |
-c | Enables console mode. In console mode there is no window. The process starts immediatly, and the results are written to standard output. In console mode you must specify a root url, otherwise this option will be ignored. |
The following HTML attributes get checked:
Tag | Attribute | Meaning |
---|---|---|
A | HREF | A simple hyperlink. |
BASE | The document-wide base for relative URL's. | |
LINK | A standard link in a document structur - or a style sheet. | |
AREA | A clickable map with hyperlinks. | |
APPLET | ARCHIVE | The JAR files containing the java classes of the applet. Each file must be separated by comma. |
SRC | The support adress/the source code of an applet - depending on the docs you read. | |
FRAME | The default files of a frameset. | |
IFRAME | The content of a floating frame. | |
EMBED | The source of a Netscape-like embedded object. | |
SOUND | A .wav or .au-file ... | |
BGSOUND | .. as background sound too. | |
INPUT | The bitmap of a <INPUT>-button. | |
SCRIPT | The external source file of a script. | |
IMG | A bitmap .. | |
LOWSRC | .. and its smaller representation, | |
USEMAP | .. with a client-side map, | |
DYNSRC | .. or a AVI-movie | |
VRML | .. or a VRML-3D-world. | |
BODY | BACKGROUND | The background pattern of a document ... |
TABLE | .. or a table .. | |
TH | .. or a table header- .. | |
TD | .. or data cell. | |
LAYER | SRC + | The contents of a layer .. |
ILAYER | .. or inline layer. (both Netscape only) | |
INS | CITE | URL for inserted or .. |
DEL | .. deleted text, .. | |
Q | .. or a cite .. | |
ACRONYM | .. or an abbreviation. | |
FORM | ACTION | The submit adress of a form .. |
ISINDEX | .. or a searchable index. |
LinkVerify checks URLs in cascading stylesheets too.
An external stylesheet file can be reffered with a <LINK>
-Tag. An inline stylesheet may be defined inside the <STYLE>
-Tag. Both features are supported. Style sheets inside the STYLE
attribute are (not yet) supported.
Checking style sheets, everything inside a "url("
and ")"
is considered to be an URL. Usually there are background images or other style sheets (included with @import
).
Furthermore there is an comparision beetween defined and referred named frames in the window Frames. A frame definition is done with the <FRAME NAME="name">
-Tag. Frames are referred with the attribute TARGET
at the tags <A>
, <BASE>
, <AREA>
and <FORM>
.
This test is no analysis of the frameset structur. The spider simply checks, whether any used frame name is defined somewhere in the analyzed files. Treat this more as a thumbrule. This feature will usually recognize typing errors only.
When referring an undefined frame name, the browser will generate a new window. This window will get the name of this frame as the title. Such a behavior may even be intended by the author.
When checking the internal structure of a website, you will usually want to try this locally first. Unfortnatly there is a difference beetween the file:
and the http:
protocol. When referring a directory via HTTP, the server will look for special files within this directory, usually index.html
. Only if this file does not exist, the server will list the directory to the client (if permitted). When using the file:
protocol, the client always gets the directory listed. This means, local checking may visit more files than checking via HTTP.
Sometimes you will encounter broken links with an HTTP-Response "Method not allowed" or "Method HEAD disabled on this server". The corresponding response code is 405, sometimes 403.
The reason is, that several servers do not allow HEAD requests. With such an HEAD request a HTTP client may determine, whether a certain URL exists on a server or not, without actually loading this URL. HEAD request are often used by web spiders, like search engines and also LinkVerify. Obviously these servers don't want to support web spiders.
I really can't imagine a rational reason for this. However there will be no workaround for this like sending another GET request after a refused HEAD request. If you would like this, implement it on your own.
Internal anchors allow an URL to refer to a position within a HTML-file. An anchor is defined by the tag <A NAME="ref">
. This anchor may be referred with <A HREF="datei#ref">
.
LinkVerify checks links to internal anchors, whether the anchor is actually defined in the referred HTML-file. This works only on non-external links, located in the dir of the root url or one of it's subdirs..
The download packages contain all sources.
Created by Ralf Wiebicke, last modified on December 20, 1998