Book HomeSAX2

2.6. Namespaces and SAX2

However you use XML namespaces with SAX, you need to understand the core concepts discussed in this section. Namespaces can be confusing; they're more complex than perhaps they ought to be. In part this is because of how they interact (or don't interact) with other parts of Greater XML; in part it's because everyone has different ways to a determine what words mean, and XML names are kinds of words. We'll look at some of those complexities first, and then at the mechanisms SAX2 has to help you deal with them.

But first, just what are namespaces supposed to do? Usually, they identify some particular technical vocabulary. People often reuse words rather than create new ones, and they acquire context-specific meanings and nuances that can be extremely important. A namespace can distinguish whether a word like "bill" refers to part of a bird, a now-archaic weapon, part of a hat, legislative acts, or a number of other things. So a <bill length='45cm'/> element might be associated with a namespace, which provides context that should help applications interpret the element. A processor for "Birder's Markup Language" could know to reject (or ignore) markup intended for legislative or financial uses, even if they all use "bill" elements.

XML defines a way to declare namespaces as needed, using attributes. Namespaces are usually indicated by a prefix, which can serve as a qualifying adjective: "the bird's bill" might be bird:bill while "the consultant's bill" might be consultant:bill. You can also set up a default element namespace so that an unadorned bill element might indicate, for example, a weapon.

2.6.1. What Namespaces Do to XML

XML namespaces are a convention for using attributes to associate URIs with some element and attribute names. Since not all legal XML documents follow this convention, the XML Namespaces specification effectively specifies a dialect of XML. SAX2 supports both dialects: strict XML and XML plus namespaces. By default, SAX2 parsers expect the namespaces dialect. In most cases you'll be able to ignore the difference between those two XML dialects, since documents that use XML in namespace-incompatible ways aren't common.

Even apart from the two-dialects issue, the use of namespaces with XML complicates XML programming. There are two models for using element and attribute names in XML:

If you're working with or designing XML structures with context-dependent names, then namespaces add new kinds of context and hence new ways to cause confusion. SAX2 gives you the tools to track all the context, but you'll have to record it yourself (probably with some kind of stack) since startElement() parameters will no longer give all the context you need.

There are also some conflicts between the element-naming approach of the XML Namespaces specifications and DTD validity as defined in the XML specification. They may not affect your SAX2 programs, but can affect the systems you're implementing with XML and SAX2. The issue is basically that DTDs expect everything to be declared once up front (like import statements in Java), while the namespace mechanism provides a lexical scoping mechanism (like declaring variables that live on the execution stack) that's flexible about what a given prefix indicates. You can make namespace-correct documents that are DTD-valid, but then you can't change the prefixes bound to namespaces.[11] Namespace-aware DTDs will often define default element namespaces for element names.

[11]If you want any flexibility in those prefixes, and have a deep understanding of how to use parameter entities, look at the approach to DTD modularization found in the XHTML 1.1 specification.

If you are designing a namespace and want to use the URI to publish information describing the namespace, rather than just use it as a unique identifier, then RDDL (http://www.rddl.org) is probably a good resource. RDDL defines an XHTML-based document syntax that can be viewed or mechanically processed. It lets you find some of the resources that might be important when working with the namespaces -- for example, different stylesheets and schemas and documentation in various languages. The RDDL web site includes SAX support for accessing this data.

2.6.2. Element and Attribute Naming with Namespaces

The direct impact of XML namespaces on your SAX2 application code is to give you a second way to identify elements and attributes. Documents will normally use only one identification style for a given element or attribute. These identification styles are distinct from the two models for using such names, described earlier:

Qualified names

These are exactly as found in the XML text. Examples include para and, with a prefix, xhtml:p. (XML documents that don't use namespaces, and some namespace-style documents won't use colons.)

Universal names

These consist of two separate strings: a "local name" from the XML text (removing any namespace prefix) and a "namespace name" (always a URI) from namespace declarations. For the qualified name xhtml:p, the local name is p, and the namespace name is the URI associated with the prefix xhtml, which is a function in the namespace declaration. Such names are in a sense "universalized" by addition of a suitable URI.

Note that the XML Namespaces specification only standardizes the "qualified name" (qName) terminology; it doesn't standardize terminology for universal names. Because of this, you will also see other terms, such as "expanded names" (the term used by XPath) or "namespace-style names" (used to talk about that style of naming).

Since ContentHandler.startElement() callbacks now have to deal with three different kinds of name strings, the code can get rather complicated. Plus, even if you're expecting only universal names, you'll need to notice when elements or attributes don't have universal names and use qualified names to work with them. Element names are identified in method parameters (the same as in ContentHandler.endElement()), while attribute names show up in accessor methods for Attributes objects. We'll use the following XML text to illustrate these different types of names:

<big:animals  xmlns="http://www.example.com/dog">
	      xmlns:big="http://www.example.com/big">
    <wolfhound cat='no' big:dog='yes' />
    <greyhound big:dog='yes' xmlns=""/>
</big:animals>

SAX2 calls names in XML text "Qualified Names." These are the same thing as "XML 1.0 names" except that XML 1.0 names have no restrictions on the use of colons. When you disable namespace processing in a SAX2 parser, it will deliver "qualified names" that are really XML 1.0 names, without those restrictions. With namespace processing enabled, many qualified names (including every name with a prefix) will correspond to a namespace-style name.

Element names without a prefix might not have a corresponding universal name. Unprefixed attribute names will never have a universal name. In those cases, applications must use the qualified name along with non-namespace context, such as the enclosing element, to figure out what the name is supposed to mean. There are no universally accepted policies for such cases. Yes, all that confuses other people as well.

2.6.2.1. Element naming

The identifiers for the element names are the first three parameters of void startElement(String namespaceURI, String localName, String qName, Attributes atts). Table 2-1 shows the values of the element names for the previous example, as reported by a SAX2 parser in its default mode. Notice particularly that the namespace URI is empty except when a namespace declaration applies to that element name, and that if there's a nonempty namespace URI, there might not be a value for qName. That's not just for element names using namespace prefixes; for element names, a default element namespace declaration will apply if it's within scope. (Remember that empty strings aren't the same as nulls.)

Table 2-1. ContentHandler.startElement() parameters for element names

namespaceURI

localName

qName

http://www.example.com/big

animals

empty or big:animals

http://www.example.com/dog

wolfhound

empty or wolfhound

empty

empty

greyhound

You could end up with lots of code like this in your SAX event handlers. Or, you may prefer to factor it as a table lookup (maybe using application-specific types of handler objects) rather than as a tree of comparisons. Notice that for elements without a namespace URI, the qName is checked, but if there's a namespace URI, then localName is used. Also all unrecognized elements are reported as a kind of validity error. You may well need to have more context-dependent logic too, if elements may only show up in appropriate contexts. Such contexts often need different decision trees. See Example 2-8 for a decision tree for startElement().

Example 2-8. Decision tree in startElement( )

public void
startElement (String uri, String localName, String qName, Attributes atts)
throws SAXException
{
    // elements outside of any namespace?
    if ("".equals (uri)) {
	if ("greyhound".equals (qName)) {
	    ... handle
	    return;
	}
	... else handle N other elements; return on success

	// no recognized element: a validity error
	errorHandler.error (new SAXParseException (
		"Unrecognized element: " + qName,
		locator
		));
	// if that doesn't abort the parse:
	return;

    // in the "big" namespace?
    } else if ("http://www.example.com/big".equals (uri)) {
	if ("animals".equals (localName)) {
	    ... handle
	    return;
	}
	... handle "islands" and N other big things; return on success
	// FALLTHROUGH for unrecognized elements

    // in the "dog" namespace?
    } else if ("http://www.example.com/dog".equals (uri)) {
	if ("wolfhound".equals (localName)) {
	    ... handle
	    return;
	}
	... handle "terrier", "collie" and so on; return on success
	// FALLTHROUGH for unrecognized elements
    }

    ... and so on for other namespaces

    // element not in a namespace we recognize: a validity error
    errorHandler.error (new SAXParseException (
	    "Unrecognized element: " + uri + " (" + localName + ")",
	    locator
	    ));
    // returns if that doesn't abort the parse
}

Most SAX2 parsers provide qualified names in all cases, but you shouldn't rely on their availablity unless the parser is configured to provide namespace prefix information (which also causes namespace-declaration attributes to be "un-hidden"). You should probably avoid using the qName, even for diagnostics, when there's a nonempty namespaceURI.

2.6.2.2. Attribute naming

The identifiers for the attribute names are accessed using Attributes methods such as getQName(), getLocalName(), and getURI() when you iterate over an element's attributes with a "for" loop. You can access attribute values directly if you use either XML 1.0-style names (qName) or XML Namespace-style names (namespaceURI and localName).

SAX2 parsers handle attribute names from the example text as shown in Table 2-2. This table shows the "mixed mode" behavior, described later; in the default SAX2 parser mode, the xmlns and xmlns:big attributes won't appear. You'd have to set the namespace-prefixes feature flag (as described later in this chapter, in Section 2.6.3, "Namespace Feature Flags") to see these attributes. Note that according to the namespaces specification there is no such thing as a default namespace for attribute names, so that namespace declaration attributes don't go into any namespace.

Table 2-2. Attributes methods to access attribute names

getURI()

getLocalName()

getQName()

empty

empty

xmlns

empty

empty

xmlns:big

empty

empty

cat

http://www.example.com/big

dog

empty or big:dog

So if you wanted to write some code that ignored elements without a big:dog attribute (that is, the URI is http://www.example.com/big/ and the local name is dog) with value "yes", it might look like this:

public void startElement (String uri, String local, String qName, 
	Attributes atts)
throws SAXException
{
    String    value;

    value = atts.getValue ("http://www.example.com/big", "dog");
    if (!"yes".equals (value)) {
	// arrange to ignore text and elements until this finishes
	return;
    }
    
    ... process the element
}

2.6.2.3. Things to keep in mind

To avoid confusing things, the previous code didn't illustrate two somewhat perverse cases. First, if the big prefix were redefined for some element, the same qualified name could correspond to a different universal name, with the same local name but different namespace URIs. That's one reason the previous code doesn't check for a qName of big:dog. Using a qName of big:dog might make sense if you were working with XML 1.0 without using XML namespaces. Second, if the URI used with the big prefix were associated with a second prefix, different qualified names could correspond to the same universal names. That's another reason the previous code doesn't check for a qName of big:dog. If you are writing namespace-aware code, use only namespace-style name testing in your code to avoid such problems. That makes your code work correctly even when it deals with documents that use namespace declarations in ways you didn't expect.

By default, SAX2 XML parsers provide universal names for elements and attributes that have namespaces (they'll have nonempty localName and namespaceURI strings) or qualified names for elements and attributes that don't, and will remove the namespace declaration attributes from the Attributes object provided in the ContentHandler.startElement() event. Unless a default element namespace declaration is in scope, an element whose XML 1.0-style name has no prefix won't have a namespace-style identifier. Attributes with unprefixed names work differently, since default element namespace declarations never apply to attribute names.

If you work with both SAX2 and DOM Level 2, you need to be aware of the differences in how these APIs expose namespaces. The terminology is similar but not identical; SAX2 talks about "URI" while DOM Level 2 talks about "NamespaceURI," and SAX2 uses "QName" not "Name"; but both APIs talk about the "LocalName." When using element or attribute construction methods in the org.w3c.dom.Document class, you will notice that DOM uses two different APIs in places in which SAX2 provides just one callback (in three different modes, as discussed in the next section). You are most likely to trip over different ways to tell whether an element or attribute has no namespace URI: SAX2 uses an empty string (length zero), while DOM Level 2 uses a null string. You may also notice that while SAX2 follows the XML Namespaces specification with regards to the attributes that define namespaces, DOM does not. In SAX2, those attributes have no URIs, but DOM assigns http://www.w3.org/2000/xmlns/ as their namespace URI.

2.6.3. Namespace Feature Flags

SAX2 controls its namespace-processing support through two feature flags, which can be tested and changed using the setFeature() and getFeature() methods described earlier in this chapter in Section 2.4.1, "SAX2 Feature Flags". The two flags are http://xml.org/sax/features/namespaces (namespaces), which controls whether parsers handle namespace declarations, and http://xml.org/sax/features/namespace-prefixes (namespace-prefixes), which controls whether applications can see the underlying XML syntax. All SAX2 parsers support both flags, although their values might be read-only.

Given two flags, there are four possible combinations. Only three are legal. It's easiest to understand what the flags do by considering them as each controlling a small processing task layered over a core that just parses XML text. The SAX2 defaults are set so both tasks are performed.

XML 1.0 mode

Only XML 1.0-style names are reported for elements and attributes, using the qName. The namespaces flag is false, and the namespace-prefixes flag is true; those values are exactly the opposite of the SAX2 defaults.

This mode passes xmlns and xmlns:* attributes without looking at them. Namespace-style names (with URIs) might be provided with element or attribute names, but you must not rely on this; few parsers will do the extra work of processing the namespace declarations. If you enable this mode, your SAX2 parser will be doing what a SAX1 parser did, but the information will flow through APIs with slots for holding namespace-style names.

Mixed mode

Both XML 1.0- and XML plus Namespaces-style names are reported for elements and attributes. The namespaces flag is true (like the default SAX2 mode), and the namespace-prefixes flag is true (like XML 1.0 mode).

This mode is much like XML 1.0 mode, but setting the namespaces flag causes startPrefixMapping() and endPrefixMapping() events (discussed in the next section) to match xmlns and xmlns:* attributes, and processes those declarations so the parser always provides namespace URIs for element and attribute names when they're defined. The qName is always provided, even when a namespace URI is defined.

Parsers running in this mode should generate some kind of error report for legal XML 1.0 documents that don't meet all the rules of the "XML plus namespaces" dialect. (Most parsers use ErrorHandler.error() although the namespace specification doesn't say what class of error to report.) One example is to use colons in names for things that aren't elements or attributes, and not declare namespace prefixes. Similarly, you might get warnings about using relative URIs in namespace declarations. There is a performance impact to this additional processing, often five percent of the usually negligible overhead for XML parsing.

XML plus namespaces mode (SAX2 default)

The difference between this and mixed mode is that some information is discarded. The namespaces flag is true, and the namespace-prefixes flag is false.

Clearing the namespace-prefixes flag tells parsers they must filter out xmlns and xmlns:* attributes, and they may report empty strings instead of providing the qName (as found in the document) whenever a namespace URI is reported. In practice, most current SAX2 parsers always report qualified names, since there's little benefit to filtering them out.

The fourth combination of flags, disabling both namespace support and namespace prefix reporting, would be meaningless, and so it is an illegal parser state. Don't set this mode; parsers might not detect that you've put them into an illegal mode and may react unintelligently (such as by entering "XML 1.0 mode"). Unfortunately it's easy to set this mode if you just set the namespaces flag to false without first setting the namespaces-prefix flag to true (entering mixed mode).

I tend to prefer the mixed mode over the SAX2 default mode. Enabling it is simple: just set the namespaces-prefix flag to true, after setting up a parser for the SAX2 defaults. This mode provides better support for the XML Infoset, since it doesn't discard information about the prefixes. You won't see implementation-dependent behaviors in exposing either type of name. Certain kinds of XML processing will work better. In particular, algorithms working near the XML syntax level -- such as writing out XML text or performing consumer-side DTD validation -- will then work without needing to guard against discarded prefixes and without re-creating namespace declaration attributes. Discarding or changing prefixes, in particular, can cause confusion when people need to look at the XML output. The only real impact on applications is having to ignore xmlns and xmlns:* attributes, which isn't hard.

Few, if any, applications really need to work with documents that use colons in ways other than the XML namespaces specification, leaving a small performance impact as the primary reason to care about the pure XML 1.0 mode. Even applications that don't use namespaces usually won't see colons used in interesting ways (like nested:contexts:for:names). While most SAX2 XML parsers support all three of these modes, they are only required to support the SAX2 default mode.

2.6.4. ContentHandler and Prefix Mappings

Sometimes XML needs to handle "meta-level" processing, in which XML talks about XML. In such processing, namespace URIs are sometimes implicitly called by prefixes found in places no XML parser will look: CDATA attributes (which can contain anything) and character content found within elements. For example, XPath expressions include prefixes, and they are found in XSLT template attributes. The W3C XML Schema Datatypes (XSD) defines a QName datatype that formalizes such usage.

When you need to work with those types of XML text, you'll find two particular ContentHandler event callbacks helpful. They provide the same information found in xmlns and xmlns:* attributes, relieving your application code of the responsibility of correctly applying the XML Namespaces specification. For example, your code won't need to know how a default element namespace declaration can be explicitly undone by xmlns="" attributes or by ending the lexical scope of that attribute.

void startPrefixMapping(String prefix, String uri)

Each namespace declaration causes one of these calls. Each call corresponds to an attribute in the next startElement() callback to be made; you probably won't see other callbacks intervening. (This method has to appear before the element; the mapping will be used to interpret names of the element or its attributes.) If the prefix is the empty string, then the declaration is for the default element namespace. This is the only time the URI may be specified as the empty string (indicating that there is no longer a default element namespace in effect).

void endPrefixMapping(String prefix)

Each call to startPrefixMapping() is paired with a matching event to declare that the mapping has gone out of scope. These calls correspond to the most recent endElement() callback. However, the mapping "start" calls and the mapping "end" calls won't necessarily be perfectly nested. For example, two prefix mappings found in one element might be started in the order xlink then MyApp, but either mapping could end first.

You'd normally ignore these two calls, unless you use them to maintain some data structure that tracks active namespace prefixes. It would have to be a stacklike data structure, since one mapping for a prefix only temporarily hides a previous mapping for the same prefix. This is the notion of lexical scope, which you are familiar with from most programming languages. SAX2 includes a helper class to handle this for you: NamespaceSupport, discussed in Section 5.1.3, "The NamespaceSupport Class " in Chapter 5, "Other SAX Classes". Then when you parse the meta-level content, you can use those data structures to interpret prefix references and handle other namespace-related work.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.