Despite its name, you don't use Extensible Markup Language (XML) to directly create and mark up web documents. Instead you use XML technology to define a new markup language, which you then use to mark up web documents. This should come as no surprise to anyone who has read the previous chapter in this book. Nor, then, should it surprise you that one of the first languages defined using XML is an XML-ized version of HTML, the most popular markup language ever. HTML is now being disciplined and cleaned up by XML to bring it back into line with the larger family of markup languages. This new standard is XHTML 1.0.[80]
[80]Throughout this chapter, we use "XHTML" to mean the XHTML 1.0 standard. There is a nascent XHTML 1.1 standard which diverges from XHTML 1.0 and HTML 4.01. See http://www.w3.org/TR/xhtml11/ for the details and differences.
Because of HTML's legacy features and oddities, using XML to describe HTML was not an easy job for the W3C. In fact, certain HTML rules, as we'll discuss later, cannot be represented using XML. Nonetheless, if the W3C has its way, XHTML will ultimately replace the HTML we currently know and love. We agree that it should.
So much of XHTML is identical to HTML's current standard, Version 4.01, that almost everything presented elsewhere in this book may be applied to both HTML and XHTML. The differences, both good and bad, are detailed in this chapter. To become fluent in XHTML, you'll first need to absorb the rest of this book, and then adjust your thinking to embrace what we present in this chapter.
HTML, as everyone should know by now, began as a simple markup language similar in appearance and usage to other SGML-based markup languages. In its early years, little effort was put into making HTML perfectly SGML-compliant. As a result, odd features and a lax attitude towards enforcing the rules became a standard part of both HTML and the browsers that processed HTML documents.
As the Web grew from an experiment into an industry, the desire for a standard version of HTML led to the creation of several official versions, culminating most recently with Version 4.01. As HTML has stabilized into this latest version, browsers have become more alike in their support of various HTML features. In general, the world of HTML has settled into a familiar set of constructs and usage rules.
Unfortunately, HTML offers only a limited set of document-creation primitives, is incapable of handling nontraditional content such as chemical formulae, musical notation, or mathematical expressions, and fails to adequately support alternative display media such as handheld computers or intelligent cellular phones. We need new ways to deliver information that can be parsed, processed, displayed, sliced, and diced by the many different communication technologies that have emerged since the Web sparked the digital communication revolution a decade ago.
Rather than trying to rein in another herd of maverick, nonstandard markup languages, the W3C introduced XML as a standard way to create new markup languages. XML is the framework upon which organizations can develop their own markup languages to suit the needs of their users. XML is an updated version of SGML, streamlined and enhanced for today's dynamic systems. And while the W3C originally intended it as a tool to create document markup languages, XML is also becoming quite useful as a standard way to define tiny little languages that are used as data exchange protocols between different applications.
Of course, we don't want to abandon the plethora of documents already marked up with HTML or the infrastructure of knowledge, tools, and technologies that currently support HTML and the Web. Yet, we do not want to miss the opportunities of XML, either. XHTML is the bridge. It uses the features of XML to define a markup language that is nearly identical to standard HTML 4.01 and gets us all started down the XML road.
HTML 4.01 comes in three variants, each defined by a separate SGML DTD. Similarly XHTML also comes in three variants, with XML DTDs corresponding to the three SGML DTDs that define HTML 4.01. To create an XHTML document, you must choose one of these DTDs and then create a document that uses its particular elements and rules.
The first XHTML DTD corresponds to the "strict" HTML DTD. The strict definition excludes all deprecated elements (tags and attributes) in HTML 4.01 and forces authors to use only those features that are fully supported in HTML. Many of the HTML elements and attributes dealing with presentation and appearance, such as the <font> tag and the align attribute, are missing from the strict XHTML DTD, replaced by the equivalent properties in the Cascading Style Sheet model.
Most HTML authors find the strict XHTML DTD too restrictive, since many of the deprecated elements and attributes are still in widespread use throughout the Web. More importantly, the popular browsers -- while fully supporting the deprecated elements -- have yet to fully implement the new standard ones. The only real advantage in using the strict XHTML DTD is that compliant documents are guaranteed to be fully supported in future versions of XHTML.[81]
[81]If the W3C has its way, HTML won't change beyond Version 4.01. No more HTML; all new developments will be in XHTML and many other XML-based languages.
Most authors will probably choose to use the "transitional" XHTML DTD. It's closest to the current HTML standard and includes all those wonderful, but deprecated, features that make life as an HTML author easier. With the transitional XHTML DTD, you can ease into the XML family while staying current with the browser industry.
The third DTD is for frames. It is identical to the transitional DTD in all other respects; the only difference is the replacement of the document body with appropriate frame elements. You might think that, for completeness' sake, there would be strict and transitional frame DTDs, but the W3C decided that if you use frames, you might as well use all the deprecated elements as well.
Copyright © 2002 O'Reilly & Associates. All rights reserved.