23.4 Changing and Generating XML
Just
like for HTML and other kinds of structured text, the simplest way to
output an XML document is often to prepare and write it using
Python's normal string and file operations, covered
in Chapter 9 and Chapter 10. Templating, covered in Chapter 22, is also often the best approach. Subclassing
class XMLGenerator, covered earlier in this
chapter, is a good way to generate an XML document that is like an
input XML document, except for a few changes.
The xml.dom.minidom module offers yet another
possibility, because its classes support methods to generate, insert,
remove, and alter nodes in a DOM tree representing the document. You
can create a DOM tree by parsing and then alter it, or you can create
an empty DOM tree and populate it, and then output the resulting XML
document with methods toxml,
toprettyxml, or writexml of the
Document instance. You can also output a subtree
of the DOM tree by calling these methods on the
Node that is the subtree's root.
23.4.1 Factory Methods of a Document Object
The
Document class supplies factory methods to create
new instances of subclasses of Node. The most
frequently used factory methods of a Document
instance d are as follows.
Builds and returns an instance
c of class Comment for
a comment with text data.
Builds and returns an instance e of class
Element for an element with the given tag.
Builds and returns an instance t of class
TextNode for a text node with text
data.
23.4.2 Mutating Methods of an Element Object
An
instance e of class
Element supplies the following methods to remove
and add attributes.
Removes e's attribute
with the given name.
e.setAttribute(name,value)
|
|
Changes e's attribute
with the given name to have the given
value, or adds to
e a new attribute with the given
name and value
if e had no attribute named
name.
23.4.3 Mutating Methods of a Node Object
An instance n
of class Node supplies the following methods to
remove, add, and replace children.
Makes child the last child of
n, whatever
child's parent was
(including n or None).
n.insertBefore(child,nextChild)
|
|
Makes child
the child of n immediately before
nextChild, whatever
child's parent was
(including n or None).
nextChild must be a child of
n.
Makes child parentless and returns
child. child
must be a child of n.
n.replaceChild(child,oldChild)
|
|
Makes child the child of
n in
oldChild's place,
whatever child's parent
was (including n or
None). oldChild must be
a child of n. Returns
oldChild.
23.4.4 Output Methods of a Node Object
An instance n of class
Node supplies the following methods to output the
subtree rooted at n.
n.toprettyxml(indent='\t',newl='\n')
|
|
Returns a string, plain or Unicode, with the XML source for the
subtree rooted at n, using
indent to indent nested tags and
newl to end lines.
Like
n.toprettyxml('',''),
i.e., inserts no extraneous whitespace.
Writes the XML source for the subtree rooted at
n to file-like object
file, open for writing. Note that
file.write must accept
Unicode strings (as covered in Section 9.6.1), unless
all text in the XML source produced can be converted implicitly to
plain strings using the current default encoding (normally
'ascii').
23.4.5 Changing and Outputting XHTML with xml.dom.minidom
The following example uses
xml.dom.minidom to analyze an XHTML page and
output it to standard output with each hyperlink's
destination URL shown, in three sets of parentheses, just before the
hyperlink:
import xml.dom.minidom, urllib, sys
f = urllib.urlopen('http://www.w3.org/MarkUp/')
doc = xml.dom.minidom.parse(f)
as = doc.getElementsByTagName('a')
for a in as:
value = a.getAttribute('href')
if value:
newtext = doc.createTextNode(' (((%s)))'%value)
a.parentNode.insertBefore(newtext,a)
class UnicodeStdoutWriter:
def write(self, data):
sys.stdout.write(data.encode('utf-8'))
doc.writexml(UnicodeStdoutWriter( ))
This example wraps sys.stdout in a little
UnicodeStdoutWriter class in order to encode
Unicode output. Further, it uses encoding 'utf-8'
because that is the encoding that the XML standard specifies as the
default, and up to Python 2.2.2 we have no way of asking object
doc to explicitly request a different
encoding. In Python 2.3, method writexml accepts
an optional keyword argument named encoding that
lets us control the encoding attribute in the XML
declaration.
|