Book HomePerl & XML

2.10. Declaring Elements and Attributes

When you need an extra level of quality control (beyond the healthful status implied by the "well-formed" label), define the grammar patterns of your markup language in the DTD. Defining the patterns will make your markup into a formal language, documented much like a standard published by an international organization. With a DTD, a program can tell in short order whether a document conforms to, or, as we say, is a valid example of, your document type.

Two kinds of declarations allow a DTD to model a language. The first is the element declaration. It adds a new name to the allowed set of elements and specifies, in a special pattern language, what can go inside the element. Here are some examples:

<!ELEMENT sandwich ((meat | cheese)+ | (peanut-butter, jelly)), condiment+, pickle?)>
<!ELEMENT pickle EMPTY>
<!ELEMENT condiment (PCDATA | mustard | ketchup )*>

The first parameter declares the name of the element. The second parameter is a pattern, a content model in parentheses, or a keyword such as EMPTY. Content models resemble regular expression syntax, the main differences being that element names are complete tokens and a comma is used to indicate a required sequence of elements. Every element mentioned in a content model should be declared somewhere in the DTD.

The other important kind of declaration is the attribute list declaration. With it, you can declare a set of optional or required attributes for a given element. The attribute values can be controlled to some extent, though the pattern restrictions are somewhat limited. Let's look at an example:

<!ATTLIST sandwich
  id        ID        #REQUIRED
  price     CDATA     #IMPLIED
  taste     CDATA     #FIXED     "yummy"
  name      (reuben | ham-n-cheese | BLT | PB-n-J )     'BLT'
>

The general pattern of an attribute declaration has three parts: a name, a data type, and a behavior. This example declares three attributes for the element <sandwich>. The first, named id, is of type ID, which is a unique string of characters that can be used only once in any ID-type attribute throughout the document, and is required because of the #REQUIRED keyword. The second, named price, is of type CDATA and is optional, according to the #IMPLIED keyword. The third, named taste, is fixed with the value "yummy" and can't be changed (all <sandwich> elements will inherit this attribute automatically). Finally, the attribute name is one of an enumerated list of values, with the default being 'BLT'.

Though they have been around for a long time and have been very successful, element and attribute declarations have some major flaws. Content model syntax is relatively inflexible. For example, it's surprisingly hard to express the statement "this element must contain one each of the elements A, B, C, and D in any order" (try it and see!). Also, the character data can't be constrained in any way. You can't ensure that a <date> contains a valid date, and not a street address, for example. Third, and most troubling for the XML community, is the fact that DTDs don't play well with namespaces. If you use element declarations, you have to declare all elements you would ever use in your document, not just some of them. If you want to leave open the possibility of importing some element types from another namespace, you can't also use a DTD to validate your document -- at least not without playing the mix-and-match DTD-combination games we described earlier, and combining DTDs doesn't always work , anyway.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.