Book HomeXML in a Nutshell

Chapter 16. XML Schemas

Contents:

Overview
Schema Basics
Working with Namespaces
Complex Types
Empty Elements
Simple Content
Mixed Content
Allowing Any Content
Controlling Type Derivation

Although Document Type Definitions can enforce basic structural rules on documents, many applications need a more powerful and expressive validation method. The W3C developed the XML Schema Recommendation, released on May 2, 2001 after a long incubation period, to address these needs. Schemas can describe complex restrictions on elements and attributes. Multiple schemas can be combined to validate documents that use multiple XML vocabularies. This chapter provides a rapid introduction to key W3C XML Schema concepts and usage.

This chapter progressively introduces the structures and concepts of XML Schemas, beginning with the fundamental structure that is common to all schemas. The chapter begins with a very simple schema and proceeds to add more functionality to it until ever major feature of XML Schemas has been introduced.

16.1. Overview

A schema is a formal description of what comprises a valid document. An XML schema is an XML document containing a formal description of what comprises a valid XML document. A W3C XML Schema Language schema is an XML schema written in the particular syntax recommended by the W3C.

TIP: In this chapter when we use the word schema without further qualification, we are referring specifically to a schema written in the W3C XML schema language. However, there are numerous other XML schema languages, including RELAX NG and Schematron, each with their own strengths and weaknesses.

An XML document described by a schema is called an instance document. If a document satisfies all the constraints specified by the schema, it is considered to be schema-valid. The schema document is associated with an instance document through one of the following methods:

16.1.1. Schemas Versus DTDs

DTDs provide the capability to do basic validation of the following items in XML documents:

  • Element nesting

  • Element occurrence constraints

  • Permitted attributes

  • Attribute types and default values

However, DTDs do not provide fine control over the format and data types of element and attribute values. Other than the various special attribute types (ID, IDREF, ENTITY, NMTOKEN, and so forth), once an element or attribute has been declared to contain character data, no limits may be placed on the length, type, or format of that content. For narrative documents (such as web pages, book chapters, newsletters, etc.), this level of control is probably good enough.

But as XML makes inroads into more data-intensive applications (such as web services using SOAP), more precise control over the text content of elements and attributes becomes important. The W3C XML Schema standard includes the following features:

  • Simple and complex data types

  • Type derivation and inheritance

  • Element occurrence constraints

  • Namespace-aware element and attribute declarations

The most important of these features is the addition of simple data types for parsed character data and attribute values. Unlike DTDs, schemas can enforce specific rules about the contents of elements and attributes. In addition to a wide range of built-in simple types (such as string, integer, decimal, and dateTime), the schema language provides a framework for declaring new data types, deriving new types from old types, and reusing types from other schemas.

Besides simple data types, schemas add the ability to place more explicit restrictions on the number and sequence of child elements that can appear in a given location. This is even true when elements are mixed with character data, unlike the mixed content model (#PCDATA) supported by DTDs.

WARNING: There are a few things that DTDs do that XML Schema can't do. Defining general entities for use in documents is one of these. XML Inclusions (XInclude) may be able to replace some uses of general entities, but DTDs remain extremely convenient for short entities.

16.1.2. Namespace Issues

As XML documents are exchanged between different people and organizations around the world, proper use of namespaces becomes critical to prevent misunderstandings. Depending on what type of document is being viewed, a simple element like <fullName>Zoe</fullName> could have widely different meanings. It could be a person's name, a pet's name, or the name of a ship that recently docked. By associating every element with a namespace URI, it is possible to distinguish between two elements with the same local name.

Because the Namespaces in XML recommendation was released after the XML 1.0 recommendation, DTDs do not provide explicit support for declaring namespace-aware XML applications. Unlike DTDs (where element and attribute declarations must include a namespace prefix), schemas validate against the combination of the namespace URI and local name rather than the prefixed name.

Namespaces are also used within instance documents to include directives to the schema processor. For example, the special attributes that are used to associate an element with a schema (schemaLocation and noNamespaceSchemaLocation) must be associated with the official XML Schema instance namespace URI (http://www.w3.org/2001/XMLSchema-instance) in order for the schema processor to recognize it as an instruction to itself.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.