:IBM developerWorks: Charming Python: Tinkering with XML and Python
IBM developerWorks: Charming Python: Tinkering with XML and Python Jun 25, 2000, 14 :51 UTC (1 Talkback[s]) (5303 reads) (Other stories by David Mertz)
"A major element of getting started on working with XML in Python is sorting out the comparative capabilities
of all the available modules. In this first installment of his new Python column, "Charming Python," David Mertz
briefly describes the most popular and useful XML-related Python modules, and points you to resources for
downloading individual modules and reading more about them. This article will help you determine which modules
are most appropriate for your specific task."
"Python is in many ways an ideal language for working with XML documents. Like Perl, REBOL, REXX, and TCL, it is a
flexible scripting language with powerful text manipulation capabilities. Moreover, more than most types of text files (or
streams), XML documents typically encode rich and complex data structures. The familiar "read some lines and compare
them to some regular expressions" style of text processing is generally not well suited to adequately parsing and
processing XML. Python, fortunately (and more so than most other languages), has both straightforward ways of dealing
with complex data structures (usually with classes and attributes), and a range of XML-related modules to aid in parsing,
processing, and generating XML."
"One general concept to keep in mind about XML is that XML documents can be processed in either a validating or
non-validating fashion. In the former type of processing, it is necessary to read a "Document Type Definition" (DTD) prior
to reading an XML document it applies to. The processing in this case will evaluate not just the simple syntactic rules for
XML documents in general, but also the specific grammatical constraints of the DTD. In many cases, non-validating
processing is adequate (and generally both faster to run, and easier to program) -- we trust the document creator to follow the rules of the
document domain. Most modules discussed below are non-validating; descriptions will indicate where validation options exist."