Validation Tools

In the context of structured document markup, "validation" is the process of comparing a document to the formal rules governing its structure and determining if the document abides by those rules. In SGML, and by extension in HTML, this means comparing a document to its DTD, or Document Type Definition.

There are currently two online services that will assist you in validating pages you write. The World Wide Web Consortium maintains one of these at http://validator.w3.org/ and the Web Design Group provides the other at http://www.htmlhelp.com/tools/validator/.

(Two other well-known validators have apparently gone out of business. The "Kinder Gentler Validator" evolved into the W3C service and is no longer available. The WebTechs validation service at http://www.webtechs.com/html-val-svc/ has not been updated in about a year. As of October, 1998, it still uses draft versions of HTML 4.0.)

The two online services are roughly comparable. The WDG's service has put a lot of effort into making error messages more user friendly. Both add new features from time to time, so I won't try to describe them in detail here. In practice, you simply give them a URL, paste in some HTML markup, or upload a file, and they run it through a validation program and print out the error messages.

If you intend to validate a lot of pages, you may want to get a copy of the program both the online services use and run it from a command line. It is called nsgmls, and it is available as part of the SP suite of SGML tools from http://www.jclark.com/sp/howtoget.htm. Be warned: nsgmls is difficult to master.

To validate an HTML file with nsgmls you will need the following:

  1. The nsgmls program itself.
  2. A copy of the HTML DTD you are using. Both the HTML 3.2 and HTML 4.0 specifications include copies of their DTDs. These can be copied and pasted carefully to text documents on your computer.
  3. Copies of the entity files used by the DTD. The are the lists of named character entities allowed in HTML. These are also included in the HTML specifications.
  4. A file containing only a Document Type Declaration, unless you include this in your HTML files.
  5. A catalog file for nsgmls to associated Public Formal Identifiers with files on your system. The PFI is the name used for DTD and entity files.

Sample Document Type Declaration for HTML 4.0 Transitional (doctype.txt - all filenames are for illustration only):

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

Sample catalog file (catalog.txt):

--  Files needed for HTML 3.2 --
PUBLIC "-//W3C//DTD HTML 3.2//EN" 		HTML32.dtd
PUBLIC "ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML" ISOlat1.ent

-- Files needed for HTML 4.0 --
PUBLIC "-//W3C//DTD HTML 4.0//EN"		HTML40strict.dtd
PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"	HTML40loose.dtd
PUBLIC "-//W3C//DTD HTML 4.0 Frameset//EN"	HTML40frameset.dtd
PUBLIC "-//W3C//ENTITIES Latin1//EN//HTML"	HTMLlat1.ent
PUBLIC "-//W3C//ENTITIES Special//EN//HTML"	HTMLspecial.ent
PUBLIC "-//W3C//ENTITIES Symbols//EN//HTML"	HTMLsymbol.ent

Test HTML document (test.html):

<TITLE>Test</TITLE>
Yes, this is a complete, valid HTML document.

Once all the pieces are assembled, the command to validate this HTML document is:

nsgmls -s -m catalog.txt doctype.txt test.html

The -s command line switch tells nsgmls only to report errors (by default, they are reported to STDERR). If it returns you to your command prompt without saying anything, that means the document is valid.