Book HomePerl & XML

2.9. Free-Form XML and Well-Formed Documents

XML's grandfather, SGML, required that every element and attribute be documented thoroughly with a long list of declarations in the DTD. We'll describe what we mean by that thorough documentation in the next section, but for now, imagine it as a blueprint for a document. This blueprint adds considerable overhead to the processing of a document and was a serious obstacle to SGML's status as a popular markup language for the Internet. HTML, which was originally developed as an SGML instance, was hobbled by this enforced structure, since any "valid" HTML document had to conform to the HTML DTD. Hence, extending the language was impossible without approval by a web committee.

XML does away with that requirement by allowing a special condition called free-form XML. In this mode, a document has to follow only minimal syntax rules to be acceptable. If it follows those rules, the document is well-formed. Following these rules is wonderfully liberating for a developer because it means that you don't have to scan a DTD every time you want to process a piece of XML. All a processor has to do is make sure that minimal syntax rules are followed.

In free-form XML, you can choose the name of any element. It doesn't have to belong to a sanctioned vocabulary, as is the case with HTML. Including frivolous markup into your program is a risk, but as long as you know what you're doing, it's okay. If you don't trust the markup to fit a pattern you're looking for, then you need to use element and attribute declarations, as we describe in the next section.

What are these rules? Here's a short list as seen though a coarse-grained spyglass:

You will encounter more rules, so for a more complete understanding of well-formedness, you should either read an introductory book on XML or look at the W3C's official recommendation at http://www.w3.org/XML.

If you want to be able to process your document with XML-using programs, make sure it is always well formed. (After all, there's no such thing as non-well-formed XML.) A tool often used to check this status is called a well-formedness checker, which is a type of XML parser that reports errors to the user. Often, such a tool can be detailed in its analysis and give you the exact line number in a file where the problem occurs. We'll discuss checkers and parsers in Chapter 3, "XML Basics: Reading and Writing".



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.