Book HomeWebmaster in a Nutshell, 3rd EditionSearch this book

10.9. XPath

XPath is a recommendation of the World Wide Web Consortium (W3C) for locating nodes in an XML document tree. XPath is not designed to be used alone but in conjunction with other tools, such as XSLT or XPointer. These tools use XPath intensively and extend it for their own needs through new functions and new basic types.

XPath provides a syntax for locating a node in an XML document. It takes its inspiration from the syntax used to denote paths in filesystems such as Unix. This node, often called the context node, depends on the context of the XPath expression. For example, the context of an XSLT expression found in an <xsl:template match="para"> template will be the selected <para> element (recall that XSLT templates use XPath expressions). This node can be compared to a Unix shell's current directory.

Given our earlier XML examples, it is possible to write the following expressions:

chapter
Selects the <chapter> element descendants of the context node

chapter/para
Selects the <para> element descendants of the <chapter> element children of the context node

../chapter
Selects the <chapter> element descendants of the parent of the context node

./chapter
Selects the <chapter> element descendants of the context node

*
Selects all element children of the context node

*/para
Selects the <para> grandchildren of the context node

.//para
Selects the <para> element descendants (children, children of children, etc.) of the context node

/para
Selects the <para> element children of the document root element

In addition, XPath recognizes the at symbol (@) for selecting an attribute instead of an element. Thus the following expressions can be used to select an attribute:

para/@id
Selects the id attribute of the <para> element descendants of the context node

@*
Selects all the attributes in the context node

Paths can be combined using the | operator. For example, intro | chapter selects the <intro> and <chapter> elements of the children of the context node.

Certain functions can also be included in the path. The functions must return a node or set of nodes. The functions available are:

Function

Selection

node( )

Any node (of any type)

text( )

Text node

comment( )

Comment node

processing-instruction( )

Processing-instruction node

id(id)

Node whose unique identifier is id

The id( ) function is especially helpful for locating a node by its unique identifier (recall that identifiers are attributes defined by the DTD). For example, we can write the expression id("xml-ref")/title to select the <title> element whose parent has the xml-ref identifier.

The preceding examples show that the analogy with file paths is rather limited. However, this syntax for writing an XPath expression is a simplification of the more complete XPath syntax where an axis precedes each step in the path.

10.9.1. Axes

Axes indicate the direction taken by the path. In the previous examples, the syntactic qualifiers such as / for root, .. for parent, and // for descendant, are abbreviations that indicate the axis of the node search. These are some of the simple axes on which to search for a node.

XPath defines other search axes that are indicated by a prefix separated from the rest of the XPath expression (called location-steps) by a double colon. For example, to indicate that we require a para node to be the parent of the context node in the document, we could write the expression preceding::para. XPath defines 13 axes:

Axis

Selection

self

The context node itself (abbreviated as .)

child

The children of the context node (by default)

descendant

The descendants of the context node; a descendant is a child, or a child of a child, and so on

descendant-or-self

Same as the descendant, but also contains the context node (abbreviated as //)

parent

The parent of the context node (abbreviated as ..)

ancestor

The ancestors of the context node

ancestor-or-self

The same nodes as the ancestor, plus the context node

following-sibling

Siblings (having the same parent as the context node) in the same document that are after the context node

preceding-sibling

Siblings in the same document that are before the context node

following

All nodes in the same document that are after the context node

preceding

All nodes in the same document that are before the context node

attribute

The attributes of the context node (abbreviated as @)

namespace

The namespace nodes of the context node

It is possible to write the following expressions:

ancestor::chapter
Selects the <chapter> elements that are ancestors of the context node

following-sibling::para/@title
Selects the title attributes of <para> elements in siblings of the context node that follow it in document order

id(xpath)/following::chapter/node( )
Selects all the nodes in the <chapter> element following the element with the xpath identifier in document order

The result of an XPath expression is a node-set. It may be helpful to filter a node-set with predicates.

10.9.2. Predicates

A predicate is an expression in square brackets that filters a node-set. For example, we could write the following expressions:

//chapter[1]
Selects the first <chapter> element in the document

//chapter[@title=XPath]
Selects the <chapter> element in the document where the value of the title attribute is the string XPath

//chapter[section]
Selects the <chapter> elements in the document with a <section> child

<para[last( )]>
Selects the last <para> element child of the context node

Note that a path in a predicate does not change the path preceding the predicate, but only filters it. Thus, the following expression:

/book/chapter[conclusion]

selects a <chapter> element that is a child of the <book> element at the root of the document with a descendant of type conclusion, but not a <conclusion> element itself.

There may be more than one predicate in an expression. The following expression:

/book/chapter[1]/section[2]

selects the second section of the first chapter. In addition, the order of the predicates matters. Thus, the following expressions are not the same:

chapter[example][2]
Selects the second <chapter> that includes <example> elements

chapter[2][example]
Selects the second <chapter> element if it includes at least one <example> element

An expression can include logical or comparison operators. The following operators are available:

Operator

Meaning

or

Logical or

and

Logical and

not( )

Negation

= !=

Equal to and different from

< <=

Less than and less than or equal to

> >=

More than and more than or equal to

The character < must be entered as &lt; in expressions. Parentheses may be used for grouping. For example:

chapter[@title = XPath]
Selects <chapter> elements where the title attribute has the value XPath

chapter[position( ) &lt; 3]
Selects the first two <chapter> elements

chapter[position( ) != last( )]
Selects <chapter> elements that are not in the last position

chapter[section/@title=examples or subsection/@title= examples]
Selects <chapter> elements that include <section> or <subsection> elements with the title attribute set to examples

XPath also defines operators that act on numbers. The numeric operators are +, -, *, div (division of real numbers), and mod (modulo).

10.9.3. Functions

In the previous examples we saw such XPath functions as position( ) and not( ). XPath defines four basic types of functions that return: booleans (true or false), numbers (real numbers), strings (strings of characters), and node-sets. The functions are grouped based on the datatypes they act upon.

The following functions deal with node-sets (optional arguments are followed by a question mark):

last( )
Returns the total number of nodes of which the context node is a part

position( )
Returns a number that is the position of the context node (in document order or after sorting)

count(node-set)
Returns the number of nodes contained in the specified node-set

id(name)
Returns the node with the identifier name

local-name([node-set])
Returns a string that is the name (without the namespace) of the first node in document order of the node-set, or the context-node, if the argument is omitted

namespace-uri([node-set])
Returns a string that is the URI for the namespace of the first node in document order of the node-set, or the context node, if the argument is omitted

name([node-set])
Returns a string that is the full name (with namespace) of the first node in document order of the node-set, or the context node, if the argument is omitted

The following functions deal with strings:

string(object)
Converts its argument object, which can be of any type, to a string.

concat(str1, str2, ...)
Returns the concatenation of its arguments.

starts-with(str1, str2)
Returns true if the first argument string (str1) starts with the second argument string (str2).

contains(str1, str2)
Returns true if the first argument string (str1) contains the second argument string (str2).

substring-before (str1, str2)
Returns the substring of the first argument string (str1) that precedes the first occurrence of the second argument string (str2).

substring-after (str1, str2)
Returns the substring of the first argument string (str1) that follows the first occurrence of the second argument string (str2).

substring(str, num[, length])
Returns the substring of the first argument (str) starting at the position specified by the second argument (num) with the length specified in the third. If the third argument is not specified, the substring continues to the end of the string.

string-length(str)
Returns the number of characters in the string.

normalize-space(str)
Returns the argument string with whitespace normalized by stripping any leading and trailing whitespace and replacing sequences of whitespace characters by a single space.

translate(str1, str2, str3)
Returns the first argument string (str1) with occurrences of characters in the second argument string (str2) replaced by the character at the corresponding position in the third argument string (str3).

The following functions deal with boolean operations:

boolean(object)
Converts its argument (object), which can be of any type, to a boolean

not(boolean)
Returns true if its argument evaluates as false

true( )
Returns true

false( )
Returns false

lang(str)
Returns true if the language of the document (or the closest ancestor indicating the language) is the language passed in the argument (str)

The following functions deal with numbers:

number([obj])
Converts its argument (obj), which can be of any type, to a number (using the context node if the argument is omitted).

sum(node-set)
Returns the sum of the result of converting every node in the node-set to a number. If any node is not a number, the function returns NaN (not a number).

floor(num)
Returns the largest integer that is not greater than the argument (num).

ceiling(num)
Returns the smallest integer that is not less than the argument (num).

round(num)
Returns the integer that is closest to the argument (num).

These functions can be used not only in XPath expressions, but in XSLT elements as well. For example, to count the number of sections in a text, we could add the following to a style sheet:

<xsl:text>The number of sections is </xsl:text>
<xsl:value-of select="count(//section)"/>

10.9.4. Additional XSLT Functions and Types

XSLT defines additional functionality for its own needs. One feature is a new datatype (in addition to the four datatypes defined by XPath): the result tree fragment. This datatype is comparable to a node-set, except that its nodes are in a tree rather than an unorganized collection. All the operations that are permitted for node-sets are permitted for tree fragments. However, you cannot use the /, //, or [ ] operators on result tree fragments.

XSLT also defines additional functions:

document(obj[, node-set])
Returns a node-set that comprises the document whose URI (related to the second, optional argument) was passed as the first argument obj. If the second argument is omitted, the context node is used.

key(str, obj)
Returns the node-set of the nodes keyed by obj in the key named str.

format-number(num, str1[, str2])
Returns a string containing the formatted value of num, according to the format-pattern string in str1 and the decimal-format string in str2 (or the default decimal- format if there is no third argument).

current( )
Returns the current node.

unparsed-entity-uri(str)
Returns the URI of the unparsed entity given by str.

generate-id(node-set)
Generates a unique ID for the first node in the given node-set.

system-property(str)
Returns the value of the system property passed as a string str. The system properties are: xsl:version (the version of XSLT implemented by the processor), xsl:vendor (a string identifying the vendor of the XSL processor), and xsl:vendor-url (the vendor's URL).



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.