XPath is a recommendation of the World Wide Web Consortium (W3C) for locating nodes in an XML document tree. XPath is not designed to be used alone but in conjunction with other tools, such as XSLT or XPointer. These tools use XPath intensively and extend it for their own needs through new functions and new basic types.
XPath provides a syntax for locating a node in an XML document. It takes its inspiration from the syntax used to denote paths in filesystems such as Unix. This node, often called the context node, depends on the context of the XPath expression. For example, the context of an XSLT expression found in an <xsl:template match="para"> template will be the selected <para> element (recall that XSLT templates use XPath expressions). This node can be compared to a Unix shell's current directory.
Given our earlier XML examples, it is possible to write the following expressions:
In addition, XPath recognizes the at symbol (@) for selecting an attribute instead of an element. Thus the following expressions can be used to select an attribute:
Paths can be combined using the | operator. For example, intro | chapter selects the <intro> and <chapter> elements of the children of the context node.
Certain functions can also be included in the path. The functions must return a node or set of nodes. The functions available are:
Function |
Selection |
---|---|
node( ) |
Any node (of any type) |
text( ) |
Text node |
comment( ) |
Comment node |
processing-instruction( ) |
Processing-instruction node |
id(id) |
Node whose unique identifier is id |
The id( ) function is especially helpful for locating a node by its unique identifier (recall that identifiers are attributes defined by the DTD). For example, we can write the expression id("xml-ref")/title to select the <title> element whose parent has the xml-ref identifier.
The preceding examples show that the analogy with file paths is rather limited. However, this syntax for writing an XPath expression is a simplification of the more complete XPath syntax where an axis precedes each step in the path.
Axes indicate the direction taken by the path. In the previous examples, the syntactic qualifiers such as / for root, .. for parent, and // for descendant, are abbreviations that indicate the axis of the node search. These are some of the simple axes on which to search for a node.
XPath defines other search axes that are indicated by a prefix separated from the rest of the XPath expression (called location-steps) by a double colon. For example, to indicate that we require a para node to be the parent of the context node in the document, we could write the expression preceding::para. XPath defines 13 axes:
Axis |
Selection |
---|---|
self |
The context node itself (abbreviated as .) |
child |
The children of the context node (by default) |
descendant |
The descendants of the context node; a descendant is a child, or a child of a child, and so on |
descendant-or-self |
Same as the descendant, but also contains the context node (abbreviated as //) |
parent |
The parent of the context node (abbreviated as ..) |
ancestor |
The ancestors of the context node |
ancestor-or-self |
The same nodes as the ancestor, plus the context node |
following-sibling |
Siblings (having the same parent as the context node) in the same document that are after the context node |
preceding-sibling |
Siblings in the same document that are before the context node |
following |
All nodes in the same document that are after the context node |
preceding |
All nodes in the same document that are before the context node |
attribute |
The attributes of the context node (abbreviated as @) |
namespace |
The namespace nodes of the context node |
It is possible to write the following expressions:
The result of an XPath expression is a node-set. It may be helpful to filter a node-set with predicates.
A predicate is an expression in square brackets that filters a node-set. For example, we could write the following expressions:
Note that a path in a predicate does not change the path preceding the predicate, but only filters it. Thus, the following expression:
/book/chapter[conclusion]
selects a <chapter> element that is a child of the <book> element at the root of the document with a descendant of type conclusion, but not a <conclusion> element itself.
There may be more than one predicate in an expression. The following expression:
/book/chapter[1]/section[2]
selects the second section of the first chapter. In addition, the order of the predicates matters. Thus, the following expressions are not the same:
An expression can include logical or comparison operators. The following operators are available:
Operator |
Meaning |
---|---|
or |
Logical or |
and |
Logical and |
not( ) |
Negation |
= != |
Equal to and different from |
< <= |
Less than and less than or equal to |
> >= |
More than and more than or equal to |
The character < must be entered as < in expressions. Parentheses may be used for grouping. For example:
XPath also defines operators that act on numbers. The numeric operators are +, -, *, div (division of real numbers), and mod (modulo).
In the previous examples we saw such XPath functions as position( ) and not( ). XPath defines four basic types of functions that return: booleans (true or false), numbers (real numbers), strings (strings of characters), and node-sets. The functions are grouped based on the datatypes they act upon.
The following functions deal with node-sets (optional arguments are followed by a question mark):
The following functions deal with strings:
The following functions deal with boolean operations:
The following functions deal with numbers:
These functions can be used not only in XPath expressions, but in XSLT elements as well. For example, to count the number of sections in a text, we could add the following to a style sheet:
<xsl:text>The number of sections is </xsl:text> <xsl:value-of select="count(//section)"/>
XSLT defines additional functionality for its own needs. One feature is a new datatype (in addition to the four datatypes defined by XPath): the result tree fragment. This datatype is comparable to a node-set, except that its nodes are in a tree rather than an unorganized collection. All the operations that are permitted for node-sets are permitted for tree fragments. However, you cannot use the /, //, or [ ] operators on result tree fragments.
XSLT also defines additional functions:
Copyright © 2003 O'Reilly & Associates. All rights reserved.