For the most part, tags -- the markup elements of HTML and XHTML -- are simple to understand and use, since they are made up of common words, abbreviations, and notations. For instance, the <i> and </i> tags tell the browser respectively to start and stop italicizing the text characters that come between them. Accordingly, the syllable "simp" in our barebones example above would appear italicized on a browser display.
The HTML and XHTML standards and their various extensions define how and where you place tags within a document. Let's take a closer look at that syntactic sugar that holds together all documents.
Every tag consists of a tag name, sometimes followed by an optional list of tag attributes, all placed between opening and closing brackets (< and >). The simplest tag is nothing more than a name appropriately enclosed in brackets, such as <head> and <i>. More complicated tags contain one or more attributes, which specify or modify the behavior of the tag.
According to the HTML standard, tag and attribute names are not case-sensitive. There's no difference in effect between <head>, <Head>, <HEAD>, or even <HeaD>; they are all equivalent. With XHTML, case is important: all current standard tag and attribute names are in lowercase.
For both HTML and XHTML, the values that you assign to a particular attribute may be case-sensitive, depending on your browser and server. In particular, file location and name references -- or uniform resource locators (URLs) -- are case-sensitive. Section 6.2, "Referencing Documents: The URL"
Tag attributes, if any, belong after the tag name, each separated by one or more tab, space, or return characters. Their order of appearance is not important.
A tag attribute's value, if any, follows an equal sign (=) after the attribute name. You may include spaces around the equal sign, so that width=6, width = 6, width =6, and width= 6 all mean the same. For readability, however, we prefer not to include spaces. That way, it's easier to pick out an attribute/value pair from a crowd of pairs in a lengthy tag.
With HTML, if an attribute's value is a single word or number (no spaces), you may simply add it after the equal sign. All other values should be enclosed in single or double quotation marks, especially those values that contain several words separated by spaces. With XHTML, all attribute values must be enclosed in double-quotes. The length of the value is limited to 1024 characters.
Most browsers are tolerant of how tags are punctuated and broken across lines. Nonetheless, avoid breaking tags across lines in your source document whenever possible. This rule promotes readability and reduces potential errors in your HTML documents.
Here are some tags with attributes:
<a href="http://www.oreilly.com/catalog.html"> <ul compact> <ul compact="compact"> <input type=text name=filename size=24 maxlength=80> <link title="Table of Contents">
The first example is the <a> tag for a hyperlink to O'Reilly & Associates' World Wide Web-based catalog of products. It has a single attribute, href, followed by the catalog's address in cyberspace -- its URL.
The second example shows an HTML tag that formats text into an unordered list of items. Its single attribute -- compact, which limits the space between list items -- does not require a value.
The third example demonstrates how the second example must be written in XHTML. Notice the compact attribute now has a value, albeit redundant, and that its value is enclosed in double quotes.
The fourth example shows an HTML tag with multiple attributes, each with a value that does not require enclosing quotation marks. Of course, with XHTML, each attribute value must be enclosed in double quotes.
The last example shows proper use of enclosing quotation marks when the attribute value is more than one word long.
What is not immediately evident in these examples is that while HTML attribute names are not case-sensitive (href works the same as HREF and HreF in HTML), most attribute values are case-sensitive. The value filename for the name attribute in the <input> tag example is not the same as the value Filename, for instance.
We alluded earlier to the fact that most tags have a beginning and an end and affect the portion of content between them. That enclosed segment may be large or small, from a single text character, syllable, or word, such as the italicized "simp" syllable in our barebones example, to the <html> tag that bounds the entire document. The starting component of any tag is the tag name and its attributes, if any. The corresponding ending tag is the tag name alone, preceded by a slash. Ending tags have no attributes.
Tags can be put inside the affected segment of another tag (nested) for multiple tag effects on a single segment of the document. For example, a portion of the following text is both bold and included as part of an anchor defined by the <a> tag:
<body> This is some text in the body, with a <a href="another_doc.html">link, a portion of which is <b>set in bold</b></a> </body>
According to the HTML and XHTML standards, you must end nested tags starting with the most recent one and work your way back out. For instance in the example, we end the bold tag (</b>) before ending the link tag (</a>) since we started in the reverse order: <a> tag first, then <b> tag. It's a good idea to follow that standard, even though most browsers don't absolutely insist you do so. You may get away with violating this nesting rule for one browser, sometimes even with all current browsers. But eventually a new browser version won't allow the violation and you'll be hard pressed to straighten out your source HTML document. And, be aware that the XHTML standard explicitly forbids improper nesting.
According to the HTML standard, a few tags do not have an ending tag. In fact, the standard forbids use of an end tag for these special ones, although most browsers are lenient and ignore the errant end tag. For example, the <br> tag causes a line break; it has no effect otherwise on the subsequent portion of the document and, hence, does not need an ending tag.
The HTML tags that do not have a corresponding end tags are:
<area> <base> <basefont> <br> <col> <frame> <hr> <img> <input> <isindex> <link> <meta> <param>
XHTML always requires end tags. Section 16.3.3, "Handling Empty Elements"
You often see documents in which the author seemingly has forgotten to include an ending tag in apparent violation of the HTML standard. Sometimes you even see a missing <body> tag. But your browser doesn't complain, and the document displays just fine. What gives? The HTML standard lets you omit certain tags or their endings for clarity and ease of preparation. The HTML standard writers didn't intend the language to be tedious.
For example, the <p> tag that defines the start of a paragraph has a corresponding end tag </p>, but the </p> ending tag rarely is used. In fact, many HTML authors don't even know it exists! Section 4.1.2, "The <p> Tag"
Rather, the HTML standard lets you omit a starting tag or ending tag whenever it can be unambiguously inferred by the surrounding context. Many browsers make good guesses when confronted with missing tags, leading the document author to assume that a valid omission was made.
We recommend that you most always add the ending tag. It'll make life easier for yourself as you transition to XHTML, as well as on the browser and anyone who might need to modify your document in the future.
HTML browsers sometimes ignore tags. This usually happens with redundant tags whose effects merely cancel or substitute for themselves. The best example is a series of <p> tags, one after the other with no intervening content. Unlike the similar series of repeating return characters in a text-processing document, most browsers skip to a new line only once. The extra <p> tags are redundant and usually ignored by the browser.
In addition, most HTML browsers ignore any tag that they don't understand or that was incorrectly specified by the document author. Browsers habitually forge ahead and make some sense of a document, no matter how badly formed and error-ridden it may be. This isn't just a tactic to overcome errors; it's also an important strategy for extensibility. Imagine how much harder it would be to add new features to the language if the existing base of browsers choked on them.
The thing to watch out for with nonstandard tags that aren't supported by most browsers is their enclosed contents, if any. Browsers that recognize the new tag may process those contents differently than those that don't support the new tag. For example, Internet Explorer and Netscape Navigator now both support the <style> tag, whose contents serve to set the variety of display characteristics of your document. However, previous versions of the popular browsers, many of which are still in use by many people today, don't support styles. Hence, older browsers ignore the <style> tag and render its contents on the user's screen, effectively defeating the tag's purpose in addition to ruining the document's appearance. Section 8.1.2, "Document-Level Style Sheets"
Copyright © 2002 O'Reilly & Associates. All rights reserved.