SAX « XML « Java Articles

Home
Java Articles
1.Build Deploy
2.Class
3.Core Library
4.Data Types
5.Database JDBC
6.Design
7.Development
8.File Input Output
9.Graphics Desktop
10.J2EE Enterprise
11.J2ME Wireless
12.JVM
13.Language
14.Library Product
15.Network
16.Security
17.SOA Web Services
18.Test
19.Web Development
20.XML
Java Articles » XML » SAX 
As you can see from Listing 2, parsing even a simple XML file can produce a significant amount of source code. SAX's event-driven (as opposed to document-driven) nature also makes the source code difficult to maintain and debug because you must be constantly aware of the parser's state when writing SAX code. Writing a SAX parser for complex document definitions can prove even more demanding; see Resources for challenging real-life examples.

First, a word on style. For instructional purposes, I have kept the code as simple as possible. In order to focus on the basic usage of SAX and DOM, I completely omitted error handling and handling of XML namespaces, among other things. Furthermore, the code has not been tuned for flexibility or elegance; it may be dull, but hopefully it is also obvious.

For more complex XML documents, we will need to map lists of objects into Java. Mapping object lists is like bartending: when a bartender pours several beers in a row, he usually leaves the tap running while he quickly swaps glasses under the tap. This is exactly what we need to do to capture a list of objects. We have no control over incoming SAX events; they flow in like beer from a tap that we can't shut off. To solve the problem, we need to provide empty containers, allow them to fill up, and continually replace them.

Just this month the ringleaders from the two leading pull-parser implementations announced XMLPull. Stefan Haustein from the kXML project and Aleksander Slominski from XPP3 (XML Pull Parser), both feeling that the lack of a common API hindered wider pull parsing adoption, began work on XMLPull in December 2001. The resulting API reflects their substantial experience, drawing from their respective projects to produce an approach that works well for a wide range of applications.

Is event-driven programming for SAX2 (Simple API for XML) endangering your sanity? After Part 1 of this three-part series introduced SAX2 parsing, you should feel more in touch with reality! In that article, I supplied basic handler techniques, which we'll build on in this article, to keep your code manageable.

One of the oldest approaches to processing XML documents in Java also proves one of the fastest: parse-event streams. That approach became standardized in Java with the SAX (Simple API for XML) interface specification, later revised as SAX2 to include support for XML Namespaces.

Most parsers fall into two broad categories: tree based (e.g., DOM) or event based (e.g., SAX). Although StAX is more closely aligned with the latter, it bridges the gap between the two. In SAX, data is pushed via events to application code handlers. In StAX, the application "pulls" the data from the XML data stream at its convenience. Application code can filter, skip tags, or stop parsing at any time. The application--not the parser--is in control, which enables a more intuitive way to process data.

HC is a compiler; it accepts a Java source code file adorned with special comments as input, and produces whatever is required to turn the class into a working SAX ContentHandler. There are two aspects to analyze: What does HC need to produce, and how will it do so? The next section discusses the answer to the first question, code. The subsequent sections discuss the answer to the second question, the compiler itself.

I begin with SAX -- the Simple API for XML. While this API is probably the hardest of the Java and XML APIs to master, it's also arguably the most powerful. Additionally, most other API implementations (like DOM parsers, JDOM, dom4j, and so forth) are based in part on a SAX parser. Understanding SAX gives you a headstart on everything else you do in XML and the Java language. In this tip specifically, I'll cover getting an instance of a SAX parser and setting some basic features and properties of that parser.

The most common techniques for manipulating XML documents are DOM, SAX, and XSLT. These techniques have a distressing lack of unifying principles among them. Everything you might want to do with XML is available in one of the major approaches, but when what you want to do crosses the boundaries of what each technique does best, it is far from clear how to approach a problem. You are likely to wind up with a hodge-podge application in which various smaller transformations are chained together with heterogeneous techniques and tools.

This article, adapted from a chapter in the forthcoming second edition of XML by Example, serves as an introduction to SAX, the event-based API for processing XML that complements the Document Object Model, or DOM, the object-based API for XML parsers published by the W3C.

Converting from SAX to JDOM is walking the same sort of blurry line as converting from SAX to DOM. It doesn't really make sense to say, "I converted my document from SAX to DOM." Like DOM, however, JDOM can use SAX to build a JDOM Document, and that turns out to be the fastest means of document creation (at least currently). To perform this creation using SAX, you would want to use the JDOM SAXBuilder class. An example of this is shown in Listing 1.

In my last tip, you learned how to set some basic features and properties of parsing on your SAX parser (in the form of an instance of the XMLReader class). Those features and properties all related to the basic handling of all XML documents that the parser interacted with, and included such things as validation, namespace handling, and entity expansion. While these are certainly important aspects of parsing, they are not tailored to a specific document format (such as XML that handles orders from an online store, or XML that represents the inventory of a machine shop). When it comes to writing logic that interacts with the parsing process itself, you want to write a SAX ContentHandler .

This tutorial examines the use of the Simple API for XML version 2.0.x, or SAX 2.0.x. It is aimed at developers who have an understanding of XML and wish to learn this lightweight, event-based API for working with XML data. It assumes that you are familiar with concepts such as well-formedness and the tag-like nature of an XML document. In this tutorial, you will learn how to use SAX to retreive, manipulate, and output XML data.

While JDOM has a well-known, standard way of handling various parsers, and DOM has no facility for this at all prior to the DOM Level 3 version, SAX remains a bit of a mystery to many developers. Many programmers write SAX code that is neither portable nor vendor independent. The result is that they lock their applications into a specific parser -- sometimes a specific version of a parser. In this tip, I explain how you can make your life easier by using a SAX helper class to free your code from this dependence on a specific vendor class.

www__.___j_a___v__a_2__s__._c__om__ | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.