Book HomeSAX2

3.4. The EntityResolver Interface

As mentioned earlier, this interface is used when a parser needs to access and parse external entities in the DTD or document content. It is not used to access the document entity itself. Cases where an EntityResolver should be used include:

Applications that handle documents with DTDs should plan to use an EntityResolver so they work robustly in the face of partial network failures, and so they avoid placing excessive loads on remote servers. That is, they should try to access local copies of DTD data even when the document specifies a remote one. There are many examples of sloppily written applications that broke when a remote system administrator moved a DTD file. Examples range from purely informative services like most RSS feeds to fee-based services like some news syndication protocols.

You can implement a useful resolver with a data structure as simple as a hash table that maps identifiers to URIs. There is normally no reason to have different parsers use different entity resolvers; documents shouldn't use the same public or (absolute) system identifiers to denote different entities. You'll normally just have one resolver, and it could adaptively cache entities if you like.

More complex catalog facilities may be used by applications that follow the SGML convention that public identifiers are Formal Public Identifiers (FPIs). FPIs serve the role that Universal Resource Names (URNs) serve for Internet-oriented systems. Such mappings can also be used with URIs, if the entity text associated with URIs is as stable as an FPI. (Such stability is one of the goals of URNs.)

Applications pass objects that implement the EntityResolver interface to the XMLReader.setEntityResolver() method. The parser will then use the resolver with all external parsed entities. The EntityResolver interface has only one method, which can throw a java.io.IOException as well as the org.xml.sax.SAXException most other callbacks throw.

InputSource resolveEntity(String publicId, String systemId)

Parsers invoke this method to map entity identifiers either to other identifiers or to data that they will parse. See the discussion in Section 3.1.2, "The InputSource Class", earlier in this chapter, for information about how the InputSource interface is used. If null is returned, then the parser will resolve the systemId without additional assistance. To avoid parsing an entity, return a value that encapsulates a zero-length text entity.

The systemId will always be present and will be a fully resolved URI. The publicId may be null. If it's not null, it will have been normalized by mapping sequences of consecutive whitespace characters to a single space character.

Example 3-3 is an example of a simple resolver that substitutes for a web-based time service running on the local machine by interpreting a private URI scheme and mapping public identifiers to alternative URIs using a dictionary that's externally maintained somehow. (For example, you might prime a hashtable with the public IDs for the XHTML 1.0, XHMTL 1.1, and DocBook 4.0 XML DTDs to point to local files.) It delegates to another resolver for other cases.

Example 3-3. Entity resolver, with chaining

public class MyResolver implements EntityResolver
{
    private EntityResolver next;
    private Dictionary     map;

    // n -- optional resolver to consult on failure 
    // m -- mapping public ids to preferred URLs
    public MyResolver (EntityResolver n, Dictionary m)
	{ next = n; map = m; }

    InputSource resolveEntity (String publicId, String systemId)
    throws SAXException, IOException
    {
	// magic URL?
	if ("http://localhost/xml/date".equals (systemId)) {
	    InputSource	  retval = new InputSource (systemId);
	    Reader 	  date;

	    date = new InputStringReader (new Date().toString ());
	    retval.setCharacterStream (date);
	    return retval;
	}

	// nonstandard URI scheme?
	if (systemId.startsWith ("blob:") {
	    InputSource   retval = new InputSource (systemId);
	    String        key = systemId.substring (5);
	    byte          data [] = Storage.keyToBlob (key);

	    retval.setInputSource (new ByteArrayInputStream (data));
	    return retval;
	}

	// use table to map public id to local URL?
	if (map != null && publicId != null) {
	    String url = (String) map.get (publicId);
	    if (url != null)
		return new InputSource (url);
	}

	// chain to next resolver?
	if (next != null)
	    return next.resolveEntity (publicId, systemId);
	return null;
    }
}

Traditionally, public identifiers are mainly used as keys to find local copies of entities. In SGML, system identifiers were optional and system-specific, so public identifiers were sometimes the only ones available. (XML changed this: system identifiers are mandatory and are URIs.) In essence, public identifiers were used in SGML to serve the role that URNs serve in web-oriented architectures. An ISO standard for FPIs exists, and now RFC 3151 (available at http://www.ietf.org/rfc/rfc3151.txt) defines a mapping from FPIs to URNs. (The FPI is normalized and transformed, then gets a urn:publicid: prefix.) When public identifiers are used with XML systems, it's largely by adopting FPI policies to interoperate with such SGML systems; however, XML public identifiers don't need to be FPIs. You may prefer to use URN schemes in newer systems. If so, be aware that some XML processing engines support only URLs as system identifiers. By letting applications interpret public IDs as URNs, SAX offers more power than some other XML APIs do.

If you want richer catalog-style functionality than the table mapping shown earlier, look for open source implementations of the XML version of the OASIS SGML/Open Catalog (SOCAT). At this time, a specification for such a catalog is a stable draft, still in development; see http://www.oasis.org/committees/entity/ for more information. This specification defines an XML text representation of mappings; the mappings can be significantly more complex than the tabular one shown earlier.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.