Book HomePerl & XML

2.11. Schemas

Several proposed alternate language schemas address the shortcomings of DTD declarations. The W3C's recommended language for doing this is called XML Schema. You should know, however, that it is only one of many competing schema-type languages, some of which may be better suited to your needs. If you prefer to use a competing schema, check CPAN to see if a module has been written to handle your favorite flavor of schemas.

Unlike DTD syntax, XML Schemas are themselves XML documents, making it possible to use many XML tools to edit them. Their real power, however, is in their fine-grained control over the form your data takes. This control makes it more attractive for documents for which checking the quality of data is at least as important as ensuring it has the proper structure. Example 2-4 shows a schema designed to model census forms, where data type checking is necessary.

Example 2-4. An XML schema

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">

  <xs:annotation>
    <xs:documentation>
      Census form for the Republic of Oz
      Department of Paperwork, Emerald City
    </xs:documentation>
  </xs:annotation>

  <xs:element name="census" type="CensusType"/>

  <xs:complexType name="CensusType">
    <xs:element name="censustaker" type="xs:decimal" minoccurs="0"/>
    <xs:element name="address" type="Address"/>
    <xs:element name="occupants" type="Occupants"/>
    <xs:attribute name="date" type="xs:date"/>
  </xs:complexType>

  <xs:complexType name="Address">
    <xs:element name="number" type="xs:decimal"/>
    <xs:element name="street" type="xs:string"/>
    <xs:element name="city"   type="xs:string"/>
    <xs:element name="province"  type="xs:string"/>
    <xs:attribute name="postalcode" type="PCode"/>
  </xs:complexType>

  <xs:simpleType name="PCode" base="xs:string">
    <xs:pattern value="[A-Z]-d{3}"/>
  </xs:simpleType>

  <xs:complexType name="Occupants">
    <xs:element name="occupant" minOccurs="1" maxOccurs="20">
     <xs:complexType>
      <xs:element name="firstname" type="xs:string"/>
      <xs:element name="surname" type="xs:string"/>
      <xs:element name="age">
       <xs:simpleType base="xs:positive-integer">
        <xs:maxExclusive value="200"/>
       </xs:simpleType>
      </xs:element>
     </xs:complexType>
    </xs:element>
   </xs:complexType>

</xs:schema>

The first line identifies this document as a schema and associates it with the XML Schema namespace. The next structure, <annotation>, is a place to document the schema's purpose and other details. After this documentation, we get into the fun stuff and start declaring element types.

We start by declaring the root of our document type, an element called <census>. The declaration is an element of type <xs:element>. Its attributes assign the name "census" and type of description for <census>, "CensusType". In schemas, unlike DTDs, the content descriptions are often kept separate from the declarations, making it easier to define generic element types and assign multiple elements to them. Further down in the schema is the actual content description, an <xs:complexType> element with name="CensusType". It specifies that a <census> contains an optional <censustaker>, followed by a required <occupants> and a required <address>. It also must have an attribute called date.

Both the attribute date and the element <censustaker> have specific data patterns assigned in the description of <census>: a date and a decimal number. If your <census> document had anything but a numerical value as its content, it would be an error according to this schema. You couldn't get this level of control with DTDs.

Schemas can check for many types. These types include numerical values like bytes, floating-point numbers, long integers, binary numbers, and boolean values; patterns for marking times and durations; Internet addresses and URLs; IDs, IDREFs, and other types borrowed from DTDs; and strings of character data.

An element type description uses properties called facets to set even more detailed limits on content. For example, the schema above gives the <age> element, whose data type is positive-integer, a maximum value of 200 using the max-inclusive facet. XML Schemas have many other facets, including precision, scale, encoding, pattern, enumeration, and max-length.

The Address description introduces a new concept: user-defined patterns. With this technique, we define postalcode with a pattern code: [A-Z]-d{3}. Using this code is like saying, "Accept any alphabetic character followed by a dash and three digits." If no data type fits your needs, you can always make up a new one.

Schemas are an exciting new technology that makes XML more useful, especially with data-specific applications such as data entry forms. We'll leave a full account of its uses and forms for another book.

2.11.1. Other Schema Strategies

While it has the blessing of the W3C, XML Schema is not the only schema option available for flexible document validation. Some programmers prefer the methods available through specifications like RelaxNG (available at http://www.oasis-open.org/committees/relax-ng/) or Schematron (http://www.ascc.net/xml/resource/schematron/schematron.html), which achieve the same goals through different philosophical means. Since the latter specification has Perl implementations that are currently available , we'll examine it further in Chapter 3, "XML Basics: Reading and Writing".



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.