Programming PHPProgramming PHPSearch this book

Chapter 11. XML

Contents:

Lightning Guide to XML
Generating XML
Parsing XML
Transforming XML with XSLT
Web Services

XML, the Extensible Markup Language, is a standardized data format. It looks a little like HTML, with tags (<example>like this</example>) and entities (&amp;). Unlike HTML, however, XML is designed to be easy to parse, and there are rules for what you can and cannot do in an XML document. XML is now the standard data format in fields as diverse as publishing, engineering, and medicine. It's used for remote procedure calls, databases, purchase orders, and much more.

There are many scenarios where you might want to use XML. Because it is a common format for data transfer, other programs can emit XML files for you to either extract information from (parse) or display in HTML (transform). This chapter shows how to use the XML parser bundled with PHP, as well as how to use the optional XSLT extension to transform XML. We also briefly cover generating XML.

Recently, XML has been used in remote procedure calls. A client encodes a function name and parameter values in XML and sends them via HTTP to a server. The server decodes the function name and values, decides what to do, and returns a response value encoded in XML. XML-RPC has proved a useful way to integrate application components written in different languages. In this chapter, we'll show you how to write XML-RPC servers and clients.

11.1. Lightning Guide to XML

Most XML consists of elements (like HTML tags), entities, and regular data. For example:

<book isbn="1-56592-610-2">
  <title>Programming PHP</title>
  <authors>
    <author>Rasmus Lerdorf</author>
    <author>Kevin Tatroe</author>
  </authors>
</book>

In HTML, you often have an open tag without a close tag. The most common example of this is:

<br>

In XML, that is illegal. XML requires that every open tag be closed. For tags that don't enclose anything, such as the line break <br>, XML adds this syntax:

<br />

Tags can be nested but cannot overlap. For example, this is valid:

<book><title>Programming PHP</title></book>

but this is not valid, because the book and title tags overlap:

<book><title>Programming PHP</book></title>

XML also requires that the document begin with a processing instruction that identifies the version of XML being used (and possibly other things, such as the text encoding used). For example:

<?xml version="1.0" ?>

The final requirement of a well-formed XML document is that there be only one element at the top level of the file. For example, this is well formed:

<?xml version="1.0" ?>
<library>
  <title>Programming PHP</title>
  <title>Programming Perl</title>
  <title>Programming C#</title>
</library>

but this is not well formed, as there are three elements at the top level of the file:

<?xml version="1.0" ?>
<title>Programming PHP</title>
<title>Programming Perl</title>
<title>Programming C#</title>

XML documents generally are not completely ad hoc. The specific tags, attributes, and entities in an XML document, and the rules governing how they nest, comprise the structure of the document. There are two ways to write down this structure: the Document Type Definition (DTD) and the Schema. DTDs and Schemas are used to validate documents; that is, to ensure that they follow the rules for their type of document.

Most XML documents don't include a DTD. Many identify the DTD as an external with a line that gives the name and location (file or URL) of the DTD:

<!DOCTYPE rss PUBLIC 'My DTD Identifier' 'http://www.example.com/my.dtd'>

Sometimes it's convenient to encapsulate one XML document in another. For example, an XML document representing a mail message might have an attachment element that surrounds an attached file. If the attached file is XML, it's a nested XML document. What if the mail message document has a body element (the subject of the message), and the attached file is an XML representation of a dissection that also has a body element, but this element has completely different DTD rules? How can you possibly validate or make sense of the document if the meaning of body changes partway through?

This problem is solved with the use of namespaces. Namespaces let you qualify the XML tag—for example, email:body and human:body.

There's a lot more to XML than we have time to go into here. For a gentle introduction to XML, read Learning XML, by Erik Ray (O'Reilly). For a complete reference to XML syntax and standards, see XML in a Nutshell, by Elliotte Rusty Harold and W. Scott Means (O'Reilly).



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.