Book HomeXML in a Nutshell

16.6. Simple Content

Earlier, the xs:simpleContent element was used to declare an element that could only contain simple content:

<xs:element name="fullName">
  <xs:complexType>
    <xs:simpleContent>
      <xs:extension base="xs:string">
        <xs:attribute name="language" type="xs:language"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>
 </xs:element>

The base type for the extension in this case was the built-in xs:string data type. But simple types are not limited to the predefined types. The xs:simpleType element can define new simple data types, which can be referenced by element and attribute declarations within the schema.

16.6.1. Defining New Simple Types

To show how new simple types can be defined, let's extend the phone element from the example application to support a new attribute called location. This attribute will be used to differentiate between work and home phone numbers. This attribute will have a new simple type called locationType, which will be referenced from the contactsType definition:

<xs:complexType name="contactsType">
  <xs:sequence>
    <xs:element name="phone" minOccurs="0">
      <xs:complexType>
        <xs:attribute name="number" type="xs:string"/>
         <xs:attribute name="location" type="addr:locationType"/>
      </xs:complexType>
    </xs:element>
  </xs:sequence>
</xs:complexType>

<xs:simpleType name="locationType">
  <xs:restriction base="xs:string"/>
</xs:simpleType>

Of course, a location type that just maps to the built-in xs:string type isn't particularly useful. Fortunately, schemas can strictly control the possible values of simple types through a mechanism called facets.

16.6.2. Facets

In schema-speak, a facet is an aspect of a possible value for a simple data type. Depending on the base type, some facets make more sense than others. For example, a numeric data type can be restricted by the minimum and maximum possible values it could contain. But these types of restrictions wouldn't make sense for a boolean value. The following list covers the different facet types that are supported by a schema processor:

Facets are applied to simple types using the xs:restriction element. Each facet is expressed as a distinct element within the restriction block, and multiple facets can be combined to further restrict potential values of the simple type.

16.6.2.1. Handling whitespace

The whiteSpace facet controls how the schema processor will deal with any whitespace within the target data. Whitespace normalization takes place before any of the other facets are processed. There are three possible values for the whiteSpace facet:

preserve
Keep all whitespace exactly as it was in the source document (basic XML 1.0 whitespace handling for content within elements).

replace
Replace occurrences of #x9 (tab), #xA (line feed), and #xD (carriage return) characters with #x20 (space) characters.

collapse
Perform the replace step first, then collapse multiple-space characters into a single space.

16.6.2.2. Restricting length

The length-restriction facets are fairly easy to understand. The length facet forces a value to be exactly the length given. The minLength and maxLength facets can be used to set a definite range for the lengths of values of the type given. For example, take the nameComponent type from the schema. What if a name component could not exceed 50 characters (because of a database limitation, for instance)? This rule can be enforced by using the maxLength facet. Incorporating this facet requires a new simple type to reference from within the nameComponent complex type definition:

<xs:complexType name="nameComponent">
  <xs:simpleContent>
    <xs:extension base="addr:nameString"/>
  </xs:simpleContent>
 </xs:complexType>
 
 <xs:simpleType name="nameString">
  <xs:restriction base="xs:string">
    <xs:maxLength value="50"/>
  </xs:restriction>
 </xs:simpleType>

The new nameString simple type is derived from the built-in xs:string type, but can contain no more than 50 characters (the default is unlimited). The same approach can be used with the length and minLength facets.

16.6.2.3. Enumerations

One of the more useful types of restriction is the simple enumeration. In many cases, it is sufficient to restrict possible values for an element or attribute to a member of a predefined list. For example, values of the new locationType simple type defined earlier could be restricted to a list of valid options like so:

<xs:simpleType name="locationType">
  <xs:restriction base="xs:string">
    <xs:enumeration value="work"/>
    <xs:enumeration value="home"/>
    <xs:enumeration value="mobile"/>
  </xs:restriction>      
</xs:simpleType>

Then, if the location attribute in any instance document contained a value not found in the list of enumeration values, the schema processor would generate a validity error.

16.6.2.4. Numeric Facets

Almost half of the of built-in data types defined by the schema specification represent numeric data of one type or another. More might be called numeric since the date/time and duration types are considered to be scalar quantities as well. The following two sections cover all of the numeric facets available, but for a comprehensive list of which of these facets are applicable to which data types, see Chapter 21.

16.6.2.5. Enforcing format

The xs:pattern facet can place very sophisticated restrictions on the format of string values. The pattern facet compares the value in question against a regular expression, and if the value doesn't conform to the expression, it generates a validation error. For example, this xs:simpleType element declares a social security number simple type using the pattern facet:

<xs:simpleType name="ssn">
  <xs:restriction base="xs:string">
    <xs:pattern value="\d\d\d-\d\d-\d\d\d\d"/>
  </xs:restriction>
 </xs:simpleType>

This new simple type enforces the rule that a social security number consists of three digits, a dash followed by two digits, another dash, and finally four more digits. The actual regular-expression language is very similar to that of the Perl programming language, but it also supports a wide range of Unicode characters. See Chapter 21 for more information on the full pattern-matching language.

16.6.2.6. Lists

XML 1.0 provided a few very simple list types that could be declared as possible attribute values: IDREFS, ENTITIES, and NMTOKENS. Schemas have generalized the concept of lists and provide the ability to declare lists of arbitrary types.

These list types are themselves simple types and may be used in the same places other simple types are used. For example, if the fullName element were to be expanded to accommodate multiple middle names, one approach would be to declare the middle element to contain a list of nameString values:

 <xs:element name="middle" type="addr:nameList" minOccurs="0"/>
. . .
<xs:complexType name="nameList">
  <xs:simpleContent>
    <xs:extension base="addr:nameListType"/>
  </xs:simpleContent>
 </xs:complexType>
 
 <xs:simpleType name="nameListType">
  <xs:list itemType="addr:nameString"/>
 </xs:simpleType>

After this change has been made, the middle element of an instance document can contain an unlimited list of names, each of which can contain up to 50 characters separated by whitespace. The use of xs:complextype here will greatly simplify adding attributes later.

16.6.2.7. Unions

In some cases, it is useful to allow potential values for elements and attributes to have any of several types. The xs:union element allows a type to be declared that can draw from multiple type spaces. For example, it might be useful to allow users to enter their own one-word descriptions into the location attribute of the phone element, as well as to choose from a list. The location attribute declaration could be modified to include a union that incorporated the locationType type and the xs:NMTOKEN types:

<xs:attribute name="location">
  <xs:simpleType>
    <xs:union memberTypes="addr:locationType xs:NMTOKEN"/>
  </xs:simpleType>
</xs:attribute>

Now the location attribute can contain either addr:locationType or xs:NMTOKEN content.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.