XHTML Standards Development
Creating XHTML 1.0 Documents
XHTML Document Declarations
Well-Formed XHTML
Try It Out
While the HTML 4.01 specification goes a long way in tidying up HTML, it still suffers from sloppy artifacts of HTML's fast and loose development. Over the years, little was done to make HTML perfectly SGML-compliant. As a result, we have a language with quirky features and browsers that easily forgive basic HTML coding errors.
With the creation of XML (see Chapter 30, "Introduction to XML"), the W3C finally had a standard set of rules for defining markup languages. It should come as no surprise that one of the first things they did with their shiny new set of rules is apply them HTML. The resulting XML-ized HTML standard is known as XHTML.
XHTML 1.0 is virtually the same as the HTML 4.01 standard, but more strict. The W3C is aiming eventually to replace HTML with XHTML to keep it in line with the larger family of XML-based markup languages.
This chapter reviews the differences and similarities between HTML 4.0 and XHTML.
Things are exciting over at the W3C. Now that they have XML on their toolbelts, they seem to be on a roll in rethinking and reshaping document markup. Between January 2000 and June 2001, they have turned out three XHTML Recommendations: XHTML 1.0, XHTML Basic, and XHTML 1.1 (XHTML 1.1 is still "Proposed" as of this writing, but since it's on the verge of approval, I'll count it anyway). This section looks at each one.
The XHTML 1.0 Recommendation (released in January 2000) is really just a reformulation of the HTML 4.01 specification according to the rules of XML. The XHTML 1.0 standard is the focus of this chapter.
Like HTML 4, XHTML 1.0 comes in three varieties -- Strict, Transitional, and Frames -- each defined by a separate DTD. (For more information on DTDs, see Chapter 30, "Introduction to XML"). It is important to specify which version you are using in your document, as modern browsers (IE 5.5+ and Netscape 6) can use this information to turn on "strict" standards-compliant formatting, as opposed to the "quirky" behavior of older, nonstandard HTML. Of course, if you do specify the DTD, then you must stick to it exactly so that your document will be valid (i.e., not breaking any rules defined by the DTD).
You must also make sure to specify the proper namespace declaration for XHTML. This is included in the <html> tag at the start of the document and is discussed later in Section 31.3, "XHTML Document Declarations".
This version excludes all deprecated tags and attributes (like <font> and align) to reinforce the separation of document structure from presentation. All style information is delegated to Cascading Style Sheets, which work the same in XHTML as in HTML (see Chapter 17, "Cascading Style Sheets" for more information).
While it is certainly possible to begin constructing web pages and sites according to the Strict DTD, it poses a greater challenge. Because there are still millions of web users with older browsers that don't support style sheets and HTML 4.0, you run the risk of alienating some users (or providing them with only lowest common denominator content). Fortunately, there is evidence that things will get easier in the future. The latest round of major browsers (Internet Explorer 5.5 for Windows, Internet Explorer 5.0 for Macintosh, and Netscape 6 on all platforms) snap into perfect standards-compliance mode when you specify "strict" in the DOCTYPE declaration.
The Transitional DTD includes all the deprecated elements in order to cater to the legacy behavior of most browsers. Deprecated tags and elements are permitted but discouraged from use. This DTD provides a way to ease web authors out of their current habits and toward abiding by standards. Most web authors today choose to use the Transitional DTD since it is what works best in most browsers.
This specification is exactly the same as the Transitional DTD, except that it includes the elements for creating framed web pages (<frameset>, <frame>, and <noframe>). The Frameset DTD is kept separate because the structure of a framed document (where <frameset> replaces <body>) is fundamentally different from regular HTML documents.
The XHTML Basic Recommendation (released in December 2000) is a stripped-down version of XHTML 1.0 aimed at preparing documents for mobile applications such as cell phones or handheld devices. The specification is consistent with the XHTML modularization efforts (discussed next). XHTML Basic contains the minimum elements necessary to be considered an XHTML document, plus images, forms, basic tables, and object support. To read more about it, see the W3C's Recommendation at http://www.w3.org/TR/2000/REC-xhtml-basic-20001219/.
XHTML 1.1 (a proposed recommendation as of this writing) reflects a breakthrough in the way markup languages are constructed. Instead of one comprehensive set of elements, this specification is broken up into task-specific modules. A module is a set of elements that handle one aspect or type of object in a document. Some modules include the core module, text, forms, tables, images, imagemaps, objects, and frames.
In a world where HTML content is being used on devices as varied as cell phones, desktop computers, refrigerator panels, dashboard consoles, and more, a "one-size-fits-all" content markup language will no longer work. Modularization is the solution to this problem. This recent module approach has a number of benefits:
Special devices and applications can "mix and match" modules based on their requirements and restraints. For instance, a simple refrigerator console probably doesn't need applet and multimedia support (although, who knows?). With XHTML 1.1, you can create a document that uses only the subset of XHTML that meets your needs.
It prevents spin-off, device-specific HTML versions. Authors can create their own XML modules, leaving the XHTML standard unscathed.
It allows "hybrid" documents in which several DTDs are used in combination. For instance, it allows web documents to have SVG (Scalable Vector Graphics) modules or MathML modules mixed in with the XHTML content.
Modularization is the way of the future for markup standards. The SMIL 2.0 specification is also broken into modules (see Chapter 27, "Introduction to SMIL"), which can then be used with other languages like XHTML. You can read more about the XHTML 1.1 specification at http://www.w3.org/TR/xhtml11/.
Copyright © 2002 O'Reilly & Associates. All rights reserved.