How to write an XML catalog file

An XML catalog is made up of entries from one or more catalog entry files. A catalog entry file is an XML file whose document element is catalog and whose content follows the XML catalog DTD defined by OASIS at http://www.oasis-open.org/committees/entity/spec.html. Most of the elements are catalog entries, each of which serves to map an identifier or URL to another location. Following are some useful examples.

Resolve the DTD location

The DOCTYPE declaration at the top of an XML document gives the processor information to identify the DTD. Here is a declaration suggested by the DTD itself:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
         "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">

The first quoted string after PUBLIC is the DTD's PUBLIC identifier, and the second quoted string is the SYSTEM identifier. In this case, the SYSTEM identifier is a full URL to the OASIS website.

You can use a public catalog entry to resolve a DTD's PUBLIC identifier, or you can use a system catalog entry to resolve a DTD's SYSTEM identifier. These two kinds of catalog entries are used only to resolve DTD identifiers and system entity identifiers (external files), not stylesheet references. Here is a simple XML catalog file that shows how to resolve a DTD identifier:

Example 4.1. Catalog entry to resolve DTD location

<?xml version="1.0"?>
<!DOCTYPE catalog
   PUBLIC "-//OASIS/DTD Entity Resolution XML Catalog V1.0//EN"
   "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd"> 1
<catalog  xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">  2
  <group  prefer="public"  xml:base="file:///usr/share/xml/" >  3

    <public 
       publicId="-//OASIS//DTD DocBook XML V4.4//EN"  4
       uri="docbook44/docbookx.dtd"/>

    <system
       systemId="http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"  5
       uri="docbook44/docbookx.dtd"/>

    <system
       systemId="docbook4.4.dtd"    6
       uri="docbook44/docbookx.dtd"/>
  </group>
</catalog>

Note these features of this catalog:

1

The catalog file's DOCTYPE identifies the file as an OASIS XML catalog file. If you don't have an Internet connection, you should remove or comment out the entire DOCTYPE declaration. If you don't, the catalog processor will try to load the catalog.dtd file over the network and fail. You can't use the catalog to resolve its own DTD location.

2

The catalog element contains the catalog content, and it includes a catalog namespace identifier.

3

The group element is a wrapper element that sets attributes that apply to all the catalog entries contained in the group. The prefer="public" attribute means the catalog resolver should try to use the PUBLIC identifier before resorting to the SYSTEM identifier. The xml:base attribute is the location that all URIs are resolved relative to.

4

The public element maps the given publicId string to the given uri location (with the xml:base value prepended).

5

The system element maps the given systemId string to the same location.

6

An abbreviated system identifier that maps to a full path location.

Why have multiple entries? So different documents that specify their DOCTYPE differently can resolve to the same location. So when a DocBook document that has this DOCTYPE declaration is processed with this catalog and a catalog resolver:

<?xml version="1.0"?>
<!DOCTYPE  book  PUBLIC  "-//OASIS//DTD DocBook XML V4.4//EN"  
     "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">

The catalog resolver loads the catalog, and as it reads the files to be processed, it looks for items to resolve. In this case we have a DOCTYPE with both a PUBLIC identifier (-//OASIS//DTD DocBook XML V4.4//EN) and a SYSTEM identifier (http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd). It finds a match on the public identifier in the catalog, and since that entry's group wrapper element prefers using the public identifier, it uses that entry. It uses the uri attribute value for that entry, and then prepends the xml:base value from its group wrapper. The result is a full pathname /usr/share/xml/docbook44/docbookx.dtd.

If it turns out that such a file is not at that location, then the catalog resolver looks for other catalog entries to resolve the item. It then tries the first system entry, which in this case matches the www.oasis-open.org URL to the same local file. If no catalog entry works, then the resolver gives up. Then the XML processor falls back to using the literal DOCTYPE's SYSTEM identifier http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd without catalog resolution, and tries to retrieve the DTD over the web.

Note

The XML catalog file that ships with version 4.3 of the DocBook XML DTD is missing an entry for the htmltblx.mod file. If your resolver reports it as missing, then add an entry like this to your catalog file:

<public publicId="-//OASIS//ELEMENTS DocBook XML HTML Tables V4.3//EN"
uri="htmltblx.mod"/>

This problem was fixed in version 4.4.

Windows pathnames

When you are specifying an xml:base or uri attribute for use on a Microsoft Windows system, you must include the drive letter in the full URI syntax if you want it to work across processors. A Windows URI has this form:

file:///c:/xml/docbook/

Note the use of forward slashes, which is standard URI syntax.

Relative SYSTEM identifiers may not work

Another document might have a much simpler DOCTYPE declaration:

<!DOCTYPE  book  SYSTEM  "docbook4.4.dtd"> 

If processed with the same catalog, there is no PUBLIC identifier to match on. So despite the prefer="public" attribute, it is forced to try to match the DOCTYPE's SYSTEM identifier with a system catalog entry. It finds a match in the systemId attribute and the uri value maps it to the same location.

Unfortunately, XML catalog entries that try to use relative system identifiers like systemId="docbook4.4.dtd" don't work with the Java resolver software currently available. The problem is that when a document with the example DOCTYPE is processed, the SAX interface in the XML parser resolves such references relative to the current document's location before the resolver gets to see it. So the resolver never has a chance to match on the original string. If you are going to use catalog files, you should probably stick with the recommended value of http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd for the SYSTEM identifier.

Locate an XSL stylesheet

You use the uri element in an XML catalog to locate stylesheets and other files. It can be used for everything that is not a declared PUBLIC or SYSTEM identifier for a DTD or system entity file. Here is an example of mapping a relative stylesheet reference to an absolute path:

Example 4.2. Catalog entry to locate a stylesheet

<?xml version="1.0"?>
<!DOCTYPE catalog
   PUBLIC "-//OASIS/DTD Entity Resolution XML Catalog V1.0//EN"
   "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <uri
        name="docbook.xsl"
        uri="file:///usr/share/xml/docbook-xsl-1.68.1/html/docbook.xsl"/>
</catalog>

With a catalog entry like this, your scripts and Makefiles can refer to the stylesheet file simply as docbook.xsl and let the catalog find its location on the system. By using a different catalog, you can map the name to a different stylesheet file without changing the script or Makefile command line.

Map a web address to a local file

As mentioned above, you can specify an web URL for the DTD or stylesheet to fetch it over the Internet. For efficiency, though, it's better to map the URLs to local files if they are available. The following catalog will do that.

Example 4.3. Catalog entry to map web address to local file

<?xml version="1.0"?>
<!DOCTYPE catalog
   PUBLIC "-//OASIS/DTD Entity Resolution XML Catalog V1.0//EN"
   "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">

<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">

<system
  systemId="http://www.oasis-open.org/docbook/xml/4.4/"
  uri="file:///usr/share/xml/docbook44/" />
<uri
  name="http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl"
  uri="file:///usr/share/xml/docbook-xsl-1.68.1/html/docbook.xsl" />
<uri
  name="http://docbook.sourceforge.net/release/xsl/current/html/chunk.xsl"
  uri="file:///usr/share/xml/docbook-xsl-1.68.1/html/chunk.xsl" />
</catalog>

There are two uri entries here, to handle both the regular and the chunking stylesheets.

Map many references with rewrite entries

To reduce the number of catalog entries, you can map a prefix instead of a bunch of similar names. Two catalog entry elements named rewriteSystem and rewriteURI let you map the first part of a reference to a different prefix. That lets you map many files in the same location with a single catalog entry. Use rewriteSystem to remap a DOCTYPE system identifier, and use rewriteURI to remap other URLs like stylesheet references.

Here is the previous example done with rewrite entries:

<?xml version="1.0"?>
<!DOCTYPE catalog
   PUBLIC "-//OASIS/DTD Entity Resolution XML Catalog V1.0//EN"
   "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <rewriteSystem
        systemIdStartString="http://www.oasis-open.org/docbook/xml/4.4/"
        rewritePrefix="file:///usr/share/xml/docbook44/" />
    <rewriteURI
        uriStartString="http://docbook.sourceforge.net/release/xsl/current/"
        rewritePrefix="file:///usr/share/xml/docbook-xsl-1.68.1/" />
</catalog>

The two stylesheet uri entries are replaced with a single rewriteURI entry. Whatever directory structure below that point that matches on both ends can be mapped. For example:

This URL:
http://docbook.sourceforge.net/release/xsl/current/html/docbook.xsl
is mapped to:
file:///usr/share/xml/docbook-xsl-1.68.1/html/docbook.xsl

This URL:
http://docbook.sourceforge.net/release/xsl/current/fo/custom.xsl
is mapped to:
file:///usr/share/xml/docbook-xsl-1.68.1/fo/custom.xsl

Using multiple catalog files

You can use the nextCatalog element to include other catalog entry files in the process. If a reference can't be resolved in the current catalog entry file, then the processor moves on to the next catalog specified by such an element. You can put nextCatalog elements anywhere in a catalog entry file, since they aren't looked at until all catalog entries in the current file have been tried. Each new catalog file can also contain nextCatalog entries.

Using this feature lets you organize your catalog entries into modular files which can be combined in various ways. For example, you could separate your DTD lookups from your stylesheet lookups. Since the DocBook DTD comes with a catalog file, you can just point to that catalog to resolve DTD PUBLIC identifiers.

For DocBook 4.4:
<nextCatalog  catalog="/usr/share/xml/docbook44/catalog.xml" />

For DocBook 4.1.2:
<nextCatalog  catalog="/usr/share/xml/docbook412/docbook.cat" />

In the latter example, it is pointing to the SGML catalog that was included with an older version of the DTD. The references in either of those catalog files are all relative to the catalog file location, so the resolver should be able to find any of the DTD files by its PUBLIC identifier. Don't try to move the DocBook catalog file out of the directory that contains the DTD files or the relative references won't work.