Appendix C. Utilities for Processing OpenDocument Files

As we were writing this book, we developed some utilities to make it easier to manipulate OpenDocument files. We hope they are equally useful to you.

OpenDocument uses the JAR format. Rather than having to unjar each document before running an XSLT transformation on it, we wrote this program, which lets you perform a transformation on a member of a JAR file without having to expand it. It also lets you create a JAR file (without a manifest) as output, if your output is intended to be used as an OpenDocument file.

Now that we have overcome the problem of the phantom DTD, we can write the main transformation program, ODTransform.java. It takes the following command line arguments:

Thus, if you are transforming a plain file to another plain file, you might have a command line like this:

To transform the content.xml file inside a document named myfile.odt, producing a non-compressed output file, you might have a command line like this:

And, to transform content.xml inside a document named myfile.odt to produce a new content.xml inside a result document named newfile.odt, your command line would be:

When creating an OpenDocument file as output, the program must also create a META-INF/manifest.xml file. The extension given in the

-outOD
paramater will determine the media-type in the manifest.

And now, Example C.2, “XSLT Transformation for OpenDocument files”, which shows the code, which you will find in file ODTransform.java in directory appc in the downloadable example files.

Example C.2. XSLT Transformation for OpenDocument files

/*
 * ODTransform.java
 * (c) 2003-2005 J. David Eisenberg
 * Licensed under LGPL
 *
 * Program purpose: to perform an XSLT transformation
 * on a member of an OpenDocument file, either
 * after unzipping or while still in its zipped state.
 * Output may go to a normal file or a zipped file.
 */

import javax.xml.transform.TransformerFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.sax.SAXResult;
import javax.xml.transform.stream.StreamSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerConfigurationException;

import org.xml.sax.XMLReader;
import org.xml.sax.InputSource;
import org.xml.sax.ContentHandler;
import org.xml.sax.ext.LexicalHandler;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.XMLReaderFactory;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

import java.util.Hashtable;
import java.util.jar.JarInputStream;
import java.util.jar.JarOutputStream;
import java.util.jar.JarEntry;
import java.util.Vector;
import java.util.zip.ZipException;

public class ODTransform
{
    String  inputFileName = null;   // input file name, or member name...
    String  inputODName = null;     // ...if given an OpenDocument input file
    String  outputFileName = null;  // output file name, or member name...
    String  outputODName = null;    // ...if given an OpenDocument output file
    String  xsltFileName = null;    // XSLT file is always a regular file

    Vector  params = new Vector();  // parameters to be passed to transform
    
    public void doTransform( )
    throws TransformerException, TransformerConfigurationException, 
         SAXException, ZipException, IOException       
    {
        /* Set up the XSLT transformation based on the XSLT file */
        File xsltFile = new File( xsltFileName );
        StreamSource streamSource = new StreamSource( xsltFile );
        TransformerFactory tFactory = TransformerFactory.newInstance(); 
        Transformer transformer = tFactory.newTransformer( streamSource );

        /* Set up parameters for transform */
        for (int i=0; i < params.size(); i += 2)
        {
            transformer.setParameter((String) params.elementAt(i),
                (String) params.elementAt(i + 1));
        }

        /* Create an XML reader which will ignore any DTDs */
        XMLReader reader = XMLReaderFactory.createXMLReader();
        reader.setEntityResolver( new ResolveDTD() );
        
        InputSource inputSource;

        if (inputODName == null)
        {
            /* This is an unpacked file. */
            inputSource =
                new InputSource( new FileInputStream( inputFileName ) );
        }
        else
        {
            /* The input file should be a member of an OD file.
               Check to see if the input file name really exists
               within the JAR file */
            JarInputStream jarStream =
                new JarInputStream( new FileInputStream( inputODName ),
                    false );
            JarEntry jarEntry;
            while ( (jarEntry = jarStream.getNextJarEntry() ) != null &&
                !(inputFileName.equals(jarEntry.getName()) ) )
                // do nothing
                ;
            inputSource = new InputSource( jarStream );
        }
        
        SAXSource saxSource = new SAXSource( reader, inputSource );
        saxSource.setSystemId( inputFileName );

        if (outputODName == null)
        {
            /* We want a regular file as output */
            FileOutputStream outputStream =
                new FileOutputStream( outputFileName );
            transformer.transform( saxSource, 
                new StreamResult( outputStream ) );
            outputStream.close();
        }
        else
        {
            /* The output file name is the name of a member of
               a JAR file (which we will build without a manifest) */
            JarOutputStream jarStream =
                new JarOutputStream( new FileOutputStream( outputODName ) );
            JarEntry jarEntry = new JarEntry( outputFileName );
            jarStream.putNextEntry( jarEntry );
            transformer.transform( saxSource, 
                new StreamResult( jarStream ) );
            
            /* Close the member file and the JAR file
               to complete the file */
            jarStream.closeEntry();
            
            createManifestFile( jarStream );
            
            /* Close the JAR file to complete the file */
            jarStream.close();
        }
    }

    /* Check to see if the command line arguments make sense */
    private void checkArgs( String[] args )
    {
        int     i;
        
        if (args.length == 0)
        {
            showUsage( );
            System.exit( 1 );
        }
        i = 0;
        while ( i < args.length )
        {
            if (args[i].equalsIgnoreCase("-in"))
            {
                if ( i+1 >= args.length)
                {
                    badParam("-in");
                }
                inputFileName = args[i+1];
                i += 2;
            }
            else if (args[i].equalsIgnoreCase("-out"))
            {
                if ( i+1 >= args.length)
                {
                    badParam("-out");
                }
                outputFileName = args[i+1];
                i += 2;
            }
            else if (args[i].equalsIgnoreCase("-xsl"))
            {
                if ( i+1 >= args.length)
                {
                    badParam("-xsl");
                }
                xsltFileName = args[i+1];
                i += 2;
            }
            else if (args[i].equalsIgnoreCase("-inod"))
            {
                if ( i+1 >= args.length)
                {
                    badParam("-inOD");
                }
                inputODName = args[i+1];
                i += 2;
            }
            else if (args[i].equalsIgnoreCase("-outod"))
            {
                if ( i+1 >= args.length)
                {
                    badParam("-outOD");
                }
                outputODName = args[i+1];
                i += 2;
            }
            else if (args[i].equalsIgnoreCase("-param"))
            {
                if ( i+2 >= args.length)
                {
                    badParam("-param");
                }
                params.addElement( args[i+1] );
                params.addElement( args[i+2] );
                i += 3;
            }
            else
            {
                System.out.println( "Unknown argument " + args[i] );
                System.exit( 1 );
            }
        }
        
        if (inputFileName == null)
        {
            System.out.println("No input file name specified.");
            System.exit( 1 );
        }
        if (outputFileName == null)
        {
            System.out.println("No output file name specified.");
            System.exit( 1 );
        }
        if (xsltFileName == null)
        {
            System.out.println("No XSLT file name specified.");
            System.exit( 1 );
        }
    }

    /* If not enough arguments for a parameter, show error and exit */
    private void badParam( String paramName )
    {
        System.out.println("Not enough parameters to " + paramName);
        System.exit(1);
    }
    
    /*
        Creates the manifest file for a compressed OpenDocument
        file.  The mType array contains pairs of filename
        extensions and corresponding mimetypes.  The comparison
        to find the extension is done in a case-insensitive manner.
    */
    private void createManifestFile( JarOutputStream jarStream )
    {
        String [] mType = {
        "odt", "application/vnd.oasis.opendocument.text",
        "ott", "application/vnd.oasis.opendocument.text-template",
        "odg", "application/vnd.oasis.opendocument.graphics",
        "otg",
            "application/vnd.oasis.opendocument.graphics-template",
        "odp", "application/vnd.oasis.opendocument.presentation",
        "otp",
            "application/vnd.oasis.opendocument.presentation-template",
        "ods", "application/vnd.oasis.opendocument.spreadsheet",
        "ots",
            "application/vnd.oasis.opendocument.spreadsheet-template",
        "odc", "application/vnd.oasis.opendocument.chart",
        "otc", "application/vnd.oasis.opendocument.chart-template",
        "odi", "application/vnd.oasis.opendocument.image",
        "oti", "application/vnd.oasis.opendocument.image-template",
        "odf", "application/vnd.oasis.opendocument.formula",
        "otf", "application/vnd.oasis.opendocument.formula-template",
        "odm", "application/vnd.oasis.opendocument.text-master",
        "oth", "application/vnd.oasis.opendocument.text-web",
        };
        
        JarEntry jarEntry;
        
        int dotPos;
        String extension;
        String mimeType = null;
        String outputStr;

        dotPos = outputODName.lastIndexOf(".");
        extension = outputODName.substring( dotPos + 1 );
        for (int i=0; i < mType.length && mimeType == null; i+=2)
        {
            if (extension.equalsIgnoreCase( mType[i] ))
            {
                mimeType = mType[i+1];
            }
        }

        if (mimeType == null)
        {
            System.err.println("Cannot find mime type for extension "
                + extension );
            mimeType = "UNKNOWN";
        }

        try
        {
            jarEntry = new JarEntry( "META-INF/manifest.xml");
            jarStream.write( "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"
                .getBytes() );
            jarStream.write( "<!DOCTYPE manifest:manifest PUBLIC \"-//OpenOffice.org//DTD Manifest 1.0//EN\" \"Manifest.dtd\">"
                .getBytes() );
            jarStream.write("<manifest:manifest xmlns:manifest=\"urn:oasis:names:tc:opendocument:xmlns:manifest:1.0\">"
                .getBytes() );
            
            outputStr = "<manifest:file-entry manifest:media-type=\"" + 
                mimeType + "\" manifest:full-path=\"/\"/>";
            jarStream.write( outputStr.getBytes() );
        
            outputStr = "<manifest:file-entry manifest:media-type=\"text/xml\" manifest:full-path=\"" + outputFileName + "\"/>";
            jarStream.write( outputStr.getBytes() );
            jarStream.write("</manifest:manifest>".getBytes() );
            jarStream.closeEntry();
        }
        catch (IOException e)
        {
            System.err.println("Cannot write file:");
            System.err.println( e.getMessage() );
        }
    }

    /* If no arguments are provided, show this brief help section */
    private void showUsage( )
    {
        System.out.println("Usage: ODTransform options");
        System.out.println("Options:");
        System.out.println("   -in inputFilename");
        System.out.println("   -xsl transformFilename");
        System.out.println("   -out outputFilename");
        System.out.println("If the input filename is within an OpenDocument file, then:");
        System.out.println("   -inOD inputOpenDocFileName");
        System.out.println("If you wish to output an OpenDocument file, then:");
        System.out.println("   -outOD outputOpenDocumentFileName");
        System.out.println( );
        System.out.println("Argument names are case-insensitive.");
    }

    public static void main(String[] args)
    {
        ODTransform transformApp = new ODTransform( );
        transformApp.checkArgs( args );
        try {
            transformApp.doTransform( );
        }
        catch (Exception e)
        {
            System.out.println("Unable to transform");
            System.out.println(e.getMessage());
        }
    }
}

As an application of the preceding script, we present an alternate method of indenting the unpacked files via a simple XSLT transformation. Example C.4, “XSLT Transformation for Indenting” shows this transformation, which simply copies the entire document tree while setting indent to yes in the <xsl:output> element.

We now present a Perl program to invoke this transformation on all the XML files in an unpacked OpenDocument file. We will need to set two paths: one to the transformation script, and one to the location of the preceding XSLT transformation. Make sure you use absolute paths for setting variables $script_location and $transform_location, because find() changes directories as it traverses the directory tree. This is file od_indent.pl in the appc directory in the downloadable example files.

Example C.5. Program to Indent OpenDocument Files via XSLT

#!/usr/bin/perl

use File::Find;

#
#   This program indents XML files within a directory.
#   a simple XSLT transform is used to indent the XML.
#

#
#   Path where you have installed the OpenDocument transform script.
#
$script_location = "/your/path/to/odtransform.sh";

#
#   Path where you have installed the XSLT transformation.
#
$transform_location = "/your/path/to/od_indent.xsl";

if (scalar @ARGV != 1)
{
    print "Usage: $0 directory\n";
    exit;
}

if (!-e $script_location)
{
    print "Cannot find the transform script at $script_location\n";
    exit;
}

if (!-e $transform_location)
{
    print "Cannot find the XSLT transformation file at " ,
        "$transform_location\n";
    exit;
}   

$dir_name = $ARGV[0];

if (!-d $dir_name)
{
    print "The argument to $0 must be the name of a directory\n";
    print "containing XML files to be indented.\n";
    exit;
}

#
#   Indent all XML files.
#
find(\&indent, $dir_name);

#   Warning:
#   This subroutine creates a temporary file with the format
#   __tempnnnn.xml, where nnnn is the current time( ). This
#   will avoid name conflicts when used with OpenOffice.org documents,
#   even though the technique is not sufficiently robust for general use.
#
sub indent
{
    my $xmlfile = $_;
    my $command;
    my $result;
    if ($xmlfile =~ m/\.xml$/)
    {
        $time = time();
        print "Indenting $xmlfile\n";
        $command = "$script_location " .
            "-in $xmlfile -xsl $transform_location -out __temp$time.xml";
        $result = system( $command );
        if ($result == 0 && -e "__temp$time.xml")
        {
            unlink $xmlfile;
            rename "__temp$time.xml", $xmlfile;
        }   
        else
        {
            print "Error occurred while indenting $xmlfile\n";
        }   
    }
}

This process may insert newlines in text as well as between elements. In cases where elements contain other elements, this is not a problem, as OpenDocument ignores whitespace between elements. When expanding text elements, though, the extra newlines could cause extra spaces to appear when repacking the document. Thus, you should use this method to indent the XML document only when you do not want to repack the resulting files.

When using XLST with OpenDocument files, you will want to make sure you have declared all the appropriate namespaces. Rather than selecting exactly the namespaces that your document uses, we provide all of the namespaces for OpenDocument in Example C.6, “XSLT Framework for Transforming OpenDocument”, which you may use as a framework for your transformations. This is file framework.xsl in directory appc in the downloadable example files.

Example C.6. XSLT Framework for Transforming OpenDocument

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
    xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
    xmlns:config="urn:oasis:names:tc:opendocument:xmlns:config:1.0"
    xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
    xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
    xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
    xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0"
    xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"
    xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0"
    xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"
    xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"
    xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
    xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
    xmlns:anim="urn:oasis:names:tc:opendocument:xmlns:animation:1.0"

    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:math="http://www.w3.org/1998/Math/MathML"
    xmlns:xforms="http://www.w3.org/2002/xforms"

    xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
    xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
    xmlns:smil="urn:oasis:names:tc:opendocument:xmlns:smil-compatible:1.0"
    
    xmlns:ooo="http://openoffice.org/2004/office"
    xmlns:ooow="http://openoffice.org/2004/writer"
    xmlns:oooc="http://openoffice.org/2004/calc" 
>

<xsl:template match="/office:document-content">
    <xsl:apply-templates/>
</xsl:template>

</xsl:stylesheet>

If you are creating an OpenDocument file from a file where white space has been preserved, you will have to convert runs of spaces into <text:s> elements, and convert tabs and line feeds into <text:tab-stop> and <text:line-break> elements. This task is not easily done in native XSLT. Example C.7, “Transforming Whitespace to OpenDocument XML” is a Java extension for Xalan which will do what you need. You will note that we create elements and attributes complete with namespace prefix. This is certainly not a recommended practice, but createElementNS() and setAttributeNS() create xmlns attributes rather than a prefixed name. You will find this Java code in file ODWhiteSpace.java in directory appc in the downloadable example files.

Example C.7. Transforming Whitespace to OpenDocument XML

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Text;
import org.apache.xpath.NodeSet;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

public class ODWhiteSpace {

    public ODWhiteSpace () 
    {}

    public static NodeList compressString( String str )
    {
        ODWhiteSpace whiteSpace = new ODWhiteSpace();
        return whiteSpace.doCompress( str );
    }

    private Document tempDoc;       // necessary for creating elements
    private StringBuffer strBuf;    // where non-whitespace accumulates
    private NodeSet resultSet;      // the value to be returned
    private int pos;                // current position in string
    private int startPos;           // where blanks begin accumulating
    private int nSpaces;            // number of consecutive spaces
    private boolean inSpaces;       // handling spaces?
    private char ch;                // current character in buffer
    private char prevChar;          // previous character in buffer
    private Element element;        // element to be added to node list

    /**
     * Create OpenDocument elements for a string.
     * @param str the string to compress.
     * @return a NodeList for insertion into an OpenDocument file
    */
    public NodeList doCompress( String str )
    {  
        if (str.length() == 0)
        {
            return null;
        }
        tempDoc = null;
        strBuf = new StringBuffer( str.length() );

    
        try
        {
            tempDoc = DocumentBuilderFactory.newInstance().
                newDocumentBuilder().newDocument();
        }
        catch(ParserConfigurationException pce)
        {
            return null;
        }
 
        resultSet = new NodeSet();
        resultSet.setShouldCacheNodes(true);
        
        pos = 0;
        startPos = 0;
        nSpaces = 0;
        inSpaces = false;
        ch = '\u0000';

        while (pos < str.length())
        {
            prevChar = ch;
            ch = str.charAt( pos );
            if (ch == ' ')
            {
                if (inSpaces)
                {
                    nSpaces++;
                }
                else
                {
                    emitText( );
                    nSpaces = 1;
                    inSpaces = true;
                    startPos = pos;
                }
            }
            else if (ch == 0x000a || ch == 0x000d)
            {
                if (prevChar != 0x000d) // ignore LF or CR after CR.
                {
                    emitPending( );
                    element = tempDoc.createElement("text:line-break");
                    resultSet.addNode(element);
                }      
            }
            else if (ch == 0x09)
            {
                emitPending( );
                element = tempDoc.createElement("text:tab-stop");
                resultSet.addNode(element);
            }
            else
            {
                if (inSpaces)
                {
                    emitSpaces( );
                }
                strBuf.append( ch );
            }
            pos++;
        }
        
        emitPending( );     // empty out anything that's accumulated
        return resultSet;
    }
    
    /**
     * Emit accumulated spaces or text
     */
    private void emitPending( )
    {
        if (inSpaces)
        {
            emitSpaces( );
        }
        else
        {
            emitText( );
        }
    }

    /**
     * Emit accumulated text.
     * Creates a text node with currently accumulated text.
     * Side effect: empties accumulated text buffer
     */
    private void emitText( )
    {
        if (strBuf.length() != 0)
        {
            Text textNode = tempDoc.createTextNode( strBuf.toString( ) );
            resultSet.addNode( textNode );
            strBuf = new StringBuffer( );
        }
    }
    
    /**
     * Emit accumulated spaces.
     * If these are leading blanks, emit only a
     * &lt;text:s&gt; element; otherwise a blank plus
     * a &lt;text:s&gt; element (if necessary)
     * Side effect: sets accumulated number of spaces to zero.
     * Side effect: sets "inSpaces" flag to false
     */
    private void emitSpaces( )
    {
        Integer n;
        
        if (nSpaces != 0)
        {
            if (startPos != 0)
            {
                Text textNode = tempDoc.createTextNode( " " );
                resultSet.addNode( textNode );
                nSpaces--;
            }

            n = new Integer(nSpaces);
            if (nSpaces >= 1 || startPos == 0)
            {
                element = tempDoc.createElement( "text:s" );
                element.setAttribute( "text:c", 
                    (new Integer(nSpaces)).toString( ) );
                resultSet.addNode( element );
            }

            inSpaces = false;
            nSpaces = 0;
        }
    }
}

This is the same program as Example 2.3, “Program show_meta.pl”, except that it uses the XML::SAX module instead of XML::Simple. XML::SAX is a perl module for the Simple API for XML, which interfaces to an event-driven parser. The parser issues many kinds of events as it parses a document; the ones we are interested in are the events that occur when an element starts, when it ends, and when we encounter the element’s text content. To use XML::SAX, you must specify a handler object, which is a Perl package that contains subroutines that are called when the parser detects events. The handler subroutines receive two parameters: a reference to the parser, and data hash with information about the event. Here are the subroutines that we will implement, the keys from the data hash that we are interested in, and how we will use their values.

start_element

This subroutine is called whenever the parser detects an opening tag for an element. The relevant keys are

Name
The name of the element (with namespace prefix)
Attributes
The value of this key is yet another hash, whose keys are the attribute names, preceded by their namespace URIs. This value for each of these keys is yet another hash, with keys Name and Value, whose values are the attribute name and value.

The program will store the element name in a scalar $element and the attributes in a global array @attributes. It sets a global scalar $text to the null string; this variable will be used to collect all the element’s text content.

characters

This subroutine is called whenever the parser detects a series of characters within an element. The relevant key is

Data
The characters that have been parsed.

The text is concatenated to the end of the $text variable. This is necessary because a single sequence of text may generate multiple calls to the character handler.

end_element

This subroutine is called whenever the parser detects an opening tag for an element. The relevant key is

Name
The name of the element (with namespace prefix).

Upon encountering the end of an element, the program will add the element name as a key in a hash named %info. The hash value will be an anonymous array consisting of the $text content followed by the @attributes array.

Here is the rewritten program, which you will find in file sax_show_meta.pl in the appc directory in the downloadable example files.

Example C.8. Program sax_show_meta.pl

#!/usr/bin/perl

#
#   Show meta-information in an OpenDocument file.
#
use XML::SAX;
use IO::File;
use Text::Wrap;
use Carp;
use strict 'vars';

my $suffix;     # file suffix

my $parser;     # instance of XML::SAX parser
my $handler;    # module that handles elements, etc.
my $filehandle; # file handle for piped input

my $info;       # the hash returned from the parser
my @attributes; # attributes from a returned element
my %attr_hash;  # hash of attribute names and values
#
#   Check for one argument: the name of the OpenDocument file
#
if (scalar @ARGV != 1)
{
    croak("Usage: $0 document");
}

#
#   Get file suffix for later reference
#
($suffix) = $ARGV[0] =~ m/\.(\w\w\w)$/;

#
#   Create an object containing handlers for relevant events.
#
$handler = MetaElementHandler->new();


#
#   Create a parser and tell it where to find the handlers.
#
$parser =
    XML::SAX::ParserFactory->parser( Handler => $handler);

#
#   Input to the parser comes from the output of member_read.pl
# 
$ARGV[0] =~ s/[;|'"]//g;  #eliminate dangerous shell metacharacters     
$filehandle = IO::File->new( "perl member_read.pl $ARGV[0] meta.xml |" ); 1

#
#   Parse and collect information.
#
$parser->parse_file( $filehandle );

#
#   Retrieve the information collected by the parser
#
$info = $handler->get_info();  2

#
#   Output phase
#
print "Title:       $info->{'dc:title'}[0]\n"
    if ($info->{'dc:title'}[0]);
print "Subject:     $info->{'dc:subject'}[0]\n"
    if ($info->{'dc:subject'}[0]);

if ($info->{'dc:description'}[0])
{
    print "Description:\n";
    $Text::Wrap::columns = 60;
    print wrap("\t", "\t", $info->{'dc:description'}[0]), "\n";
}

print "Created:     ";
print format_date($info->{'meta:creation-date'}[0]);
print " by $info->{'meta:initial-creator'}[0]"
    if ($info->{'meta:initial-creator'}[0]);
print "\n";

print "Last edit:   ";
print format_date($info->{"dc:date"}[0]);
print " by $info->{'dc:creator'}[0]"
    if ($info->{'dc:creator'}[0]);
print "\n";

#
#   Take attributes from the meta:document-statistic element
#   (if any) and put them into %attr_hash
#
@attributes = @{$info->{'meta:document-statistic'}};

if (scalar(@attributes) > 1)
{
    shift @attributes;
    %attr_hash = @attributes;

    if ($suffix eq "sxw")
    {
        print "Pages:       $attr_hash{'meta:page-count'}\n";
        print "Words:       $attr_hash{'meta:word-count'}\n";
        print "Tables:      $attr_hash{'meta:table-count'}\n";
        print "Images:      $attr_hash{'meta:image-count'}\n";
    }
    elsif ($suffix eq "sxc")
    {
        print "Sheets:      $attr_hash{'meta:table-count'}\n";
        print "Cells:       $attr_hash{'meta:cell-count'}\n"
            if ($attr_hash{'meta:cell-count'});
    }
}

#
#   A convenience subroutine to make dates look
#   prettier than ISO-8601 format.
#
sub format_date
{
    my $date = shift;
    my ($year, $month, $day, $hr, $min, $sec);
    my @monthlist = qw (Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec);
    
    ($year, $month, $day, $hr, $min, $sec) =
        $date =~ m/(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{2}):(\d{2})/;
    return "$hr:$min on $day $monthlist[$month-1] $year";
}


package MetaElementHandler; 3

my %element_info;   # the data structure that we are creating
my $element;        # name of element being processed
my @attributes;     # attributes for this element
my $text;           # text content of the element


sub new { 4
    my $class = shift;
    my %opts = @_;
    bless \%opts, $class;
}

sub reset {
    my $self = shift;
    %$self = ();
}

#
#   Store current element and its attribute.
#
sub start_element
{
    my ($self, $parser_data) = @_;
    
    my $hashref; 5
    my $item;       # loop control variable

    $element = $parser_data->{"Name"};

    foreach $item (keys %{$parser_data->{"Attributes"}})
    {
        $hashref =  $parser_data->{"Attributes"}{$item};
        push @attributes, $hashref->{"Name"},  $hashref->{"Value"};
    }
    
    $text = ""; # no text content yet.
}

#
#   Create an entry into a hash for the element that is ending
#
sub end_element
{
    my ($self, $parser_data) = @_;

    $element = $parser_data->{"Name"};
    $element_info{$element} = [$text, @attributes];
}

#
#   Accumulate element's text content.
#
sub characters
{
    my ($self, $parser_data) = @_;
    $text .= $parser_data->{"Data"}; 6
}

#   Return a reference to the %info hash 
#
sub get_info 7
{
    my $self = shift;
    return \%element_info;
}
1

XML::SAX doesn’t read from file handles opened with the standard Perl open() function; you have to use IO::File to create the file handle.

2

The handler object has accumulated all the information from the meta.xml file into a hash. We ask the handler to return a reference to that hash.

3

XML::SAX wants its handler subroutines to be in a Perl object. The package statement serves to “encapsulate” the variables and subroutines. As good citizens, we don’t directly access any of the variables from the main program.

4

The new subroutine completes the work of making this package into a Perl object. The reset subroutine is for XML::SAX’s internal use.

5

The $hashref variable is here for convenience; if we didn’t use it, then the push statement would be even less readable than it already is.

6

Note the .= operation; since the text inside an element can come from many calls to characters, we have to concatentate them all.

7

This is not an XML::SAX routine; we are providing it so that we can hand a reference to our accumulated data back to the main program.