Book HomePerl & XML

3.8. XML::Writer

Compared to all we've had to deal with in this chapter so far, writing XML will be a breeze. It's easier to write it because now the shoe's on the other foot: your program has a data structure over which it has had complete control and knows everything about, so it doesn't need to prepare for every contingency that it might encounter when processing input.

There's nothing particularly difficult about generating XML. You know about elements with start and end tags, their attributes, and so on. It's just tedious to write an XML output method that remembers to cross all the t's and dot all the i's. Does it put a space between every attribute? Does it close open elements? Does it put that slash at the end of empty elements? You don't want to have to think about these things when you're writing more important code. Others have written modules to take care of these serialization details for you.

David Megginson's XML::Writer is a fine example of an abstract XML generation interface. It comes with a handful of very simple methods for building any XML document. Just create a writer object and call its methods to crank out a stream of XML. Table 3-1 lists some of these methods.

Table 3-1. XML::Writer methods

Name

Function

end( )

Close the document and perform simple well-formedness checking (e.g., make sure that there is one root element and that every start tag has an associated end tag). If the option UNSAFE is set, however, most well-formedness checking is skipped.

xmlDecl([$endoding, $standalone])

Add an XML Declaration at the top of the document. The version is hard-wired as "1.0".

doctype($name, [$publicId, $systemId])

Add a document type declaration at the top of the document.

comment($text)

Write an XML comment.

pi($target [, $data])

Output a processing instruction.

startTag($name [, $aname1 => $value1, ...])

Create an element start tag. The first argument is the element name, which is followed by attribute name-value pairs.

emptyTag($name [, $aname1 => $value1, ...])

Set up an empty element tag. The arguments are the same as for the startTag( ) method.

endTag([$name])

Create an element end tag. Leave out the argument to have it close the currently open element automatically.

dataElement($name, $data [, $aname1 => $value1, ...])

Print an element that contains only character data. This element includes the start tag, the data, and the end tag.

characters($data)

Output a parcel of character data.

Using these routines, we can build a complete XML document. The program in Example 3-10, for example, creates a basic HTML file.

Example 3-10. HTML generator

use IO;
my $output = new IO::File(">output.xml");

use XML::Writer;
my $writer = new XML::Writer( OUTPUT => $output );

$writer->xmlDecl( 'UTF-8' );
$writer->doctype( 'html' );
$writer->comment( 'My happy little HTML page' );
$writer->pi( 'foo', 'bar' );
$writer->startTag( 'html' );
$writer->startTag( 'body' );
$writer->startTag( 'h1' );
$writer->startTag( 'font', 'color' => 'green' );
$writer->characters( "<Hello World!>" );
$writer->endTag( );
$writer->endTag( );
$writer->dataElement( "p", "Nice to see you." );
$writer->endTag( );
$writer->endTag( );
$writer->end( );

This example outputs the following:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<!-- My happy little HTML page -->
<?foo bar?>
<html><body><h1><font color="green">&lt;Hello World!&gt;</font></h1><p>Nice to see you.</p></body></html>

Some nice conveniences are built into this module. For example, it automatically takes care of illegal characters like the ampersand (&) by turning them into the appropriate entity references. Quoting of entity values is automatic, too. At any time during the document-building process, you can check the context you're in with predicate methods like within_element('foo'), which tells you if an element named 'foo' is open.

By default, the module outputs a document with all the tags run together. You might prefer to insert whitespace in some places to make the XML more readable. If you set the option NEWLINES to true, then it will insert newline characters after element tags. If you set DATA_MODE, a similar effect will be achieved, and you can combine DATA_MODE with DATA_INDENT to automatically indent lines in proportion to depth in the document for a nicely formatted document.

The nice thing about XML is that it can be used to organize just about any kind of textual data. With XML::Writer, you can quickly turn a pile of information into a tightly regimented document. For example, you can turn a directory listing into a hierarchical database like the program in Example 3-11.

Example 3-11. Directory mapper

use XML::Writer;
my $wr = new XML::Writer( DATA_MODE => 'true', DATA_INDENT => 2 );
&as_xml( shift @ARGV );
$wr->end;

# recursively map directory information into XML
#
sub as_xml {
    my $path = shift;
    return unless( -e $path );

    # if this is a directory, create an element and
    # stuff it full of items
    if( -d $path ) {
        $wr->startTag( 'directory', name => $path );

        # Load the names of all things in this
        # directory into an array
        my @contents = ( );
        opendir( DIR, $path );
        while( my $item = readdir( DIR )) {
            next if( $item eq '.' or $item eq '..' );
            push( @contents, $item );
        }
        closedir( DIR );

        # recurse on items in the directory
        foreach my $item ( @contents ) {
            &as_xml( "$path/$item" );
        }

        $wr->endTag( 'directory' );

    # We'll lazily call anything that's not a directory a file.
    } else {
        $wr->emptyTag( 'file', name => $path );
    }
}

Here's how the example looks when run on a directory (note the use of DATA_MODE and DATA_INDENT to improve readability):

$ ~/bin/dir /home/eray/xtools/XML-DOM-1.25

<directory name="/home/eray/xtools/XML-DOM-1.25">
  <directory name="/home/eray/xtools/XML-DOM-1.25/t">
    <file name="/home/eray/xtools/XML-DOM-1.25/t/attr.t" />
    <file name="/home/eray/xtools/XML-DOM-1.25/t/minus.t" />
    <file name="/home/eray/xtools/XML-DOM-1.25/t/example.t" />
    <file name="/home/eray/xtools/XML-DOM-1.25/t/print.t" />
    <file name="/home/eray/xtools/XML-DOM-1.25/t/cdata.t" />
    <file name="/home/eray/xtools/XML-DOM-1.25/t/astress.t" />
    <file name="/home/eray/xtools/XML-DOM-1.25/t/modify.t" />
  </directory>
  <file name="/home/eray/xtools/XML-DOM-1.25/DOM.gif" />
  <directory name="/home/eray/xtools/XML-DOM-1.25/samples">
    <file
    name="/home/eray/xtools/XML-DOM-1.25/samples/REC-xml-19980210.xml"
    />
  </directory>
  <file name="/home/eray/xtools/XML-DOM-1.25/MANIFEST" />
  <file name="/home/eray/xtools/XML-DOM-1.25/Makefile.PL" />
  <file name="/home/eray/xtools/XML-DOM-1.25/Changes" />
  <file name="/home/eray/xtools/XML-DOM-1.25/CheckAncestors.pm" />
  <file name="/home/eray/xtools/XML-DOM-1.25/CmpDOM.pm" />

We've seen XML::Writer used step by step and in a recursive context. You could also use it conveniently inside an object tree structure, where each XML object type has its own "to-string" method making the appropriate calls to the writer object. XML::Writer is extremely flexible and useful.

3.8.1. Other Methods of Output

Remember that many parser modules have their own ways to turn their current content into simple, pretty strings of XML. XML::LibXML, for example, lets you call a toString( ) method on the document or any element object within it. Consequently, more specific processor classes that subclass from this module or otherwise make internal use of it often make the same method available in their own APIs and pass end user calls to it to the underlying parser object. Consult the documentation of your favorite processor to see if it supports this or a similar feature.

Finally, sometimes all you really need is Perl's print function. While it lives at a lower level than tools like XML::Writer, ignorant of XML-specific rules and regulations, it gives you a finer degree of control over the process of turning memory structures into text worthy of throwing at filehandles. If you're doing especially tricky work, falling back to print may be a relief, and indeed some of the stunts we pull in Chapter 10, "Coding Strategies" use print. Just don't forget to escape those naughty < and & characters with their respective entity references, as shown in Table 2-1, or be generous with CDATA sections.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.