Chapter 9. Filters in OpenOffice.org

To this point, we have been building stand-alone applications to transform external files, in XML format or just plain text, to OpenDocument format. OpenOffice.org allows you to integrate an XSLT transformation into the application as a filter.

XSLT-based filters work by associating an XML file type, which we will call the “foreign” file, XSLT transformation files for import and/or export, and an OpenOffice.org template file. XML elements in the foreign file are associated with styles in the template file. The import transformation will take the foreign file’s content and insert it into the template, assigning styles as appropriate. The export transformation will read the OpenOffice.org document, and, using the style information, create a foreign file.

The remainder of this chapter will be a case study that shows how to construct and install XSLT-based filters.

The XML that we will import is a database of amateur wrestling clubs in California (yes, this is an actual database; the phone numbers and emails have been changed.) The state is divided into several areas or associations; for example, SCVWA—the Santa Clara Valley Wrestling Association. Each association consists of a series of clubs. Example 9.1, “Sample Club Database” shows an abbreviated file. A club can have multiple email addresses, and the <info> element is optional. The only element that isn’t self-explanatory is the <age-groups> element. Its type attribute tells which age groups the club serves: Kids, Cadets, Juniors, Open (competitors out of high school), and Women. The <info> element may contain hypertext link to a club’s website, represented by the HTML <a> element, which has been borrowed into this custom language without a namespace.

Figure 9.1, “Imported Club Database” shows the OpenOffice.org Writer file that we want as a result.

We will now create the template file in OpenOffice.org. This is just a skeleton document with styles that will be associated with XML elements. Figure 9.2, “Styles in Writer Template” shows the names of the paragraph and character styles in the template. [This is file clublist_template.ott in directory ch09 in the downloadable example files.]

That having been done, we create the stylesheet, shown in Example 9.2, “Stylesheet for Transforming Club List to Writer Document”. The template doesn’t have to include any <style:style> elements; those have been taken care of in the template. [This is file club_to_writer.xsl in directory ch09 in the downloadable example files.]

Example 9.2. Stylesheet for Transforming Club List to Writer Document

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
  xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
  xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
  xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
  xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
  xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
  xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
  xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
  xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0" 
  xmlns:math="http://www.w3.org/1998/Math/MathML"
  xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"
  xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"
  office:version="1.0">
  
<xsl:template match="/">
    <office:document>
        <office:body>
            <xsl:apply-templates select="club-database/association"/>
        </office:body>
    </office:document>
</xsl:template>

<xsl:template match="association">
    <text:h text:outline-level="1" text:style-name="Association"> 1
        <xsl:value-of select="@id"/>
    </text:h>
    <xsl:apply-templates select="club"/>
</xsl:template>


<xsl:template match="club">
    <text:h text:level="2" text:style-name="Club Name">
        <xsl:value-of select="name" />
        <xsl:text> </xsl:text>
        <text:span text:style-name="Club Code"><xsl:value-of 2
            select="@id" /></text:span>
    </text:h>
    <text:p text:style-name="Default">
        <xsl:text>Chartered: </xsl:text>
        <text:span text:style-name="Charter">
            <xsl:value-of select="@charter"/>
        </text:span>
    </text:p>
    <text:p text:style-name="Default">
        <xsl:text>Contact: </xsl:text>
        <text:span text:style-name="Contact">
            <xsl:value-of select="contact"/>
        </text:span>
    </text:p>
    <text:p text:style-name="Default">
        <xsl:text>Location: </xsl:text>
        <text:span text:style-name="Location">
            <xsl:value-of select="location"/>
        </text:span>
    </text:p>
    <text:p text:style-name="Default">
        <xsl:text>Phone: </xsl:text>
        <text:span text:style-name="Phone">
            <xsl:value-of select="phone"/>
        </text:span>
    </text:p>

    <xsl:choose>
        <xsl:when test="count(email) = 1"> 3
            <text:p text:style-name="Default">
                <xsl:text>Email: </xsl:text>
                <text:span text:style-name="Email">
                    <xsl:value-of select="email"/>
                </text:span>
            </text:p>
        </xsl:when>
        <xsl:when test="count(email) &gt; 1">
            <text:p text:style-name="Default">
                <text:span>Email:</text:span>
            </text:p>
            <text:list text:style-name="UnorderedList">
                <xsl:for-each select="email">
                    <text:list-item>
                        <text:p text:style-name="Default">
                            <text:span text:style-name="Email">
                                <xsl:value-of select="."/>
                            </text:span>
                        </text:p>
                    </text:list-item>
                </xsl:for-each>
            </text:list>
        </xsl:when>
    </xsl:choose>

    <xsl:apply-templates select="age-groups"/>
    
    <xsl:apply-templates select="info"/>
</xsl:template>

<xsl:template match="age-groups">
    <text:p text:style-name="Default">
        <xsl:text>Age Groups: </xsl:text>
        <text:span text:style-name="Age Groups">
            <xsl:if test="contains(@type,'K')"> 4
                <xsl:text>Kids </xsl:text>
            </xsl:if>
            <xsl:if test="contains(@type,'C')">
                <xsl:text>Cadets </xsl:text>
            </xsl:if>
            <xsl:if test="contains(@type,'J')">
                <xsl:text>Juniors </xsl:text>
            </xsl:if>
            <xsl:if test="contains(@type,'O')">
                <xsl:text>Open </xsl:text>
            </xsl:if>
            <xsl:if test="contains(@type,'W')">
                <xsl:text>Women </xsl:text>
            </xsl:if>
        </text:span>
    </text:p>
</xsl:template>

<xsl:template match="info">
    <text:p text:style-name="Club Info"> 5
        <xsl:if test="normalize-space(.) != ''">
            <xsl:apply-templates/>
        </xsl:if>
    </text:p>
</xsl:template>

<xsl:template match="a"> 6
    <text:a xlink:type="simple" xlink:href="{@href}"><xsl:value-of select="."/></text:a>
</xsl:template>

</xsl:stylesheet>
1 This is the first occurrence of connecting the foreign file’s content with a custom style in the template.
2 Notice that we attach the style only to the actual content, not to the entire paragraph. This means we don’t have to parse the paragraph content upon export.
3 If there’s only one email address, it is placed on the same line as the label; otherwise, the transformation creates an unordered list of all the email addresses.
4 Go through the age group symbols one at a time. Note that we will have to parse this in the export transformation.
5 Even if there’s nothing in the <info> element, we want an empty paragraph for the spacing.
6 This is how you add a hypertext link to an OpenOffice.org Writer document; it also borrows the <a> element from HTML, but does it the right way—with a namespace.

Creating the export filter is a much more difficult task. When we imported a file, a hierarchical structure like this …

… was “flattened” into a structure like this:

The export filter will have to take this flattened structure and re-create the nesting. The algorithm for this is not particularly difficult:

For each <text:h> element with a text:style-name of Association:

To construct a <club> element:

  1. Create an opening <club> element.
  2. While the next sibling of this element is a <text:p> element:
    1. If there is a child <text:span> element, create an appropriate child element based on the span’s text:style-name.
    2. Otherwise, if there is a neighboring <text:list>, then you have a list of emails.[15] Extract the email addresses and create the appropriate <email> elements in the target document.
    3. Otherwise, if this is a club info paragraph, inset an <info> element.
  3. You have encountered a <text:h> element or the end of the file. Close the <club> element.

This is not exactly rocket surgery, but the job is complicated by the fact that XSLT almost exclusively uses recursion, not iteration.[16] This makes the transformation ugly, so we will present it in parts. [This is file writer_to_club.xsl in directory ch09 in the downloadable example files.]

The first part shows the opening <xsl:stylesheet> element, showing the namespaces that could be used in the OpenOffice.org document. The transformation won’t work without these declarations, but we do not want to see the namespaces in the resulting output file. Thus, we use the exclude-result-prefixes attribute to eliminate namespace delcarations from our ouput.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
  xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
  xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
  xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
  xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
  xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
  xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"
  xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"
  xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
  exclude-result-prefixes="text xsl fo office style table draw xlink form script config number svg">
<xsl:output method="xml" indent="yes"/>

Almost the only place we can use XSLT’s natural processing style is to grab all the <text:h> elements for the associations. Processing an association creates the <association> element with its ID, and then starts the process of making entries for the constituent clubs. Implicit in this code is the presumption that there is at least one club in an association.

Note

When you are exporting a document, its XML representation is a “unified document,” with the contents of all the files (meta.xml, styles.xml, content.xml, etc.) all enclosed in an <office:document> element, not the <office:document-content> that we have been using in previous chapters. If you want to see what such a file looks like, install file unified_document.xsl in directory ch09 from the downloadable example files.

<xsl:template match="/">
    <xsl:apply-templates select="office:document/office:body/
        office:text/text:h[@text:style-name='Association']"/>
</xsl:template>

<xsl:template match="text:h[@text:style-name='Association']">
    <association id="{.}">
        <xsl:call-template name="make-club">
            <xsl:with-param name="clubNode"
             select="following-sibling::text:h[1]"/>
        </xsl:call-template>
    </association>
</xsl:template>

We can now make the club(s) in the association.

<xsl:template name="make-club">
    <xsl:param name="clubNode"/>
    <xsl:if test="$clubNode/@text:style-name = 'Club_20_Name'"> 1
        <club>
            <xsl:attribute name="id">
                <xsl:value-of
                    select="$clubNode/text:span[@text:style-name='Club_20_Code']"/>
            </xsl:attribute>      
            <name><xsl:value-of select="$clubNode"/></name>
            <xsl:call-template name="make-content">
                <xsl:with-param name="contentNode"
                    select="$clubNode/following-sibling::*[1]"/> 2
            </xsl:call-template>
            
        </club>
        <xsl:if test="$clubNode/following-sibling::text:h[1]"> 3
            <xsl:call-template name="make-club">
                <xsl:with-param name="clubNode"
                    select="$clubNode/following-sibling::text:h[1]"/>
            </xsl:call-template>
        </xsl:if>
    </xsl:if>
</xsl:template>
1 The node that was passed on to the make-club template could be either a <text:h> for a club name or the next association if this was the last club. Hence, the <xsl:if> to make sure we have a club name.
2 When we proceed to gather the club’s content, we have to blindly pass on the first following sibling element—it could be a <text:p> that is part of the club, a <text:h> that starts a new club, or a <text:h> that starts a new association.
3 After completing this club, check to see if this node has a following <text:h> node. If so, recursively call this template with that new node, which could be another club or the next association.

Assembling the content for a club works very much along the same lines.

<xsl:template name="make-content">
    <xsl:param name="contentNode"/>
    <xsl:if test="name($contentNode) = 'text:p'"> 1
        <xsl:choose>
            <xsl:when test="$contentNode/text:span"> 2
                <xsl:call-template name="add-item">
                    <xsl:with-param name="spanNode"
                        select="$contentNode/text:span"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:when test="name($contentNode/following-sibling::*[1]) = 
                'text:list'"> 3
                <xsl:call-template name="email-list">
                    <xsl:with-param name="emailList"
                        select="$contentNode/following-sibling::text:list[1]"/>
                </xsl:call-template>
            </xsl:when>
            <xsl:when test="$contentNode/@text:style-name = 'Club_20_Info'"> 4
                <info>
                    <xsl:apply-templates select="$contentNode"/>
                </info>
            </xsl:when>
        </xsl:choose>
        <xsl:call-template name="make-content"> 5
            <xsl:with-param name="contentNode"
                select="$contentNode/following-sibling::*[1]"/>
        </xsl:call-template>
    </xsl:if>
</xsl:template>
1 If this isn’t a paragraph, then it’s not part of the club content. (This stops recursion when we hit the end of the file or the next club/association.)
2 If this paragraph has a <text:span> child, then it’s a charter, location, contact, phone, single email, or age group specification. Hand it off to another template.
3 If there’s an unordered list following this paragraph, then it must be a club with multiple emails. Again, hand the list off to another template.
4 Club information is just straight text with embedded links, so use <apply-templates> to handle the text (with the default template) and the links with a soon-to-be-described template.
5 In any case, keep gathering content by recursively calling this template with the next node in the document.

Here’s the template that adds individual elements as children of a club. The styleAttr variable is for convenience, to make the source easier to read. All the elements except <age-groups> are handled by adding the span’s contents. Age groups are special, and, rather than trying to split up a list of keywords and recursively handle them, we cheat. The call to the translate function eliminates all lowercase letters and blanks, leaving the uppercase abbreviations for the age groups. For example, Kids Cadets Open is instantly reduced to KCO.

<xsl:template name="add-item">
    <xsl:param name="spanNode"/>
    <xsl:variable name="styleAttr" select="$spanNode/@text:style-name"/>
    
    <xsl:choose>
        <xsl:when test="$styleAttr = 'Charter'">
            <charter><xsl:value-of select="$spanNode"/></charter>
        </xsl:when>
        <xsl:when test="$styleAttr = 'Contact'">
            <contact><xsl:value-of select="$spanNode"/></contact>
        </xsl:when>
        <xsl:when test="$styleAttr = 'Phone'">
            <phone><xsl:value-of select="$spanNode"/></phone>
        </xsl:when>
        <xsl:when test="$styleAttr = 'Location'">
            <location><xsl:value-of select="$spanNode"/></location>
        </xsl:when>
        <xsl:when test="$styleAttr = 'Email'">
            <email><xsl:value-of select="$spanNode"/></email>
        </xsl:when>
        <xsl:when test="$styleAttr = 'Age_20_Groups'">
            <age-groups>
                <xsl:attribute name="type">
                    <xsl:value-of select="translate($spanNode,
                    ' abcdefghijklmnopqrstuvwxyz', '')"/>
                </xsl:attribute>
            </age-groups>
        </xsl:when>
    </xsl:choose>
</xsl:template>

Rounding out the XSLT stylesheet are the templates that handle a list of email addresses within a <text:unordered-list> and the <text:a> element inside the club information.

<xsl:template name="email-list">
    <xsl:param name="emailList"/>
    <xsl:for-each select="$emailList/descendant::text:span[@text:style-name='Email']">
        <email><xsl:value-of select="."/></email>
    </xsl:for-each>
</xsl:template>

<xsl:template match="text:a">
<a href="{@xlink:href}"><xsl:apply-templates/></a>
</xsl:template>

</xsl:stylesheet>


[15] This is where our cleverness of reperesenting multiple emails as a list comes back to haunt us.

[16] When your only tool is a hammer, everything looks like a nail.


Copyright (c) 2005 O’Reilly & Associates, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".