LinkVerify

Contents

This document describes the program in detail. There is also a quick usage description.

Command line parameters

Normally you will call LinkVerify without any parameters. They may be useful in shell scripts, or if you have an text console only.

javawiebicke.htmltools.LinkVerify [-x] [-c] [url]
urlThe root url to be checked.
-xDo not check external links.
-cEnables console mode. In console mode there is no window. The process starts immediatly, and the results are written to standard output. In console mode you must specify a root url, otherwise this option will be ignored.

Links in Hypertext (.html)

The following HTML attributes get checked:

TagAttributeMeaning
AHREFA simple hyperlink.
BASEThe document-wide base for relative URL's.
LINKA standard link in a document structur - or a style sheet.
AREAA clickable map with hyperlinks.
APPLETARCHIVEThe JAR files containing the java classes of the applet. Each file must be separated by comma.
SRCThe support adress/the source code of an applet - depending on the docs you read.
FRAMEThe default files of a frameset.
IFRAMEThe content of a floating frame.
EMBEDThe source of a Netscape-like embedded object.
SOUNDA .wav or .au-file ...
BGSOUND .. as background sound too.
INPUTThe bitmap of a <INPUT>-button.
SCRIPTThe external source file of a script.
IMGA bitmap ..
LOWSRC .. and its smaller representation,
USEMAP .. with a client-side map,
DYNSRC .. or a AVI-movie
VRML .. or a VRML-3D-world.
BODYBACKGROUNDThe background pattern of a document ...
TABLE .. or a table ..
TH .. or a table header- ..
TD .. or data cell.
LAYERSRC +
BACKGROUND
The contents of a layer ..
ILAYER .. or inline layer. (both Netscape only)
INSCITEURL for inserted or ..
DEL .. deleted text, ..
Q .. or a cite ..
ACRONYM .. or an abbreviation.
FORMACTIONThe submit adress of a form ..
ISINDEX .. or a searchable index.

Links in Style Sheets (.css)

LinkVerify checks URLs in cascading stylesheets too.

An external stylesheet file can be reffered with a <LINK>-Tag. An inline stylesheet may be defined inside the <STYLE>-Tag. Both features are supported. Style sheets inside the STYLE attribute are (not yet) supported.

Checking style sheets, everything inside a "url(" and ")" is considered to be an URL. Usually there are background images or other style sheets (included with @import).

Named Frames

Furthermore there is an comparision beetween defined and referred named frames in the window Frames. A frame definition is done with the <FRAME NAME="name">-Tag. Frames are referred with the attribute TARGET at the tags <A>, <BASE>, <AREA> and <FORM>.

This test is no analysis of the frameset structur. The spider simply checks, whether any used frame name is defined somewhere in the analyzed files. Treat this more as a thumbrule. This feature will usually recognize typing errors only.

When referring an undefined frame name, the browser will generate a new window. This window will get the name of this frame as the title. Such a behavior may even be intended by the author.

Checking without HTTP

When checking the internal structure of a website, you will usually want to try this locally first. Unfortnatly there is a difference beetween the file: and the http: protocol. When referring a directory via HTTP, the server will look for special files within this directory, usually index.html. Only if this file does not exist, the server will list the directory to the client (if permitted). When using the file: protocol, the client always gets the directory listed. This means, local checking may visit more files than checking via HTTP.

Error "Method not allowed"

Sometimes you will encounter broken links with an HTTP-Response "Method not allowed" or "Method HEAD disabled on this server". The corresponding response code is 405, sometimes 403.

The reason is, that several servers do not allow HEAD requests. With such an HEAD request a HTTP client may determine, whether a certain URL exists on a server or not, without actually loading this URL. HEAD request are often used by web spiders, like search engines and also LinkVerify. Obviously these servers don't want to support web spiders.

I really can't imagine a rational reason for this. However there will be no workaround for this like sending another GET request after a refused HEAD request. If you would like this, implement it on your own.

Internal Anchors

Internal anchors allow an URL to refer to a position within a HTML-file. An anchor is defined by the tag <A NAME="ref">. This anchor may be referred with <A HREF="datei#ref">.

LinkVerify checks links to internal anchors, whether the anchor is actually defined in the referred HTML-file. This works only on non-external links, located in the dir of the root url or one of it's subdirs..

Sources

The download packages contain all sources.

Created by Ralf Wiebicke, last modified on December 20, 1998