SAX - Java working with XML overview (tutorial)

Complete Source Code for all examples below

Points to remember:

  • Simple API for XML - the most basic approach
  • Event driven - events are things that happen during XML parsing
  • Nothing stored in memory by API you have to do that yourself
  • Use when you have memory constraints, or when working with huge files, or when you only need few pieces of information from the XML
  • We only have one go at this thing (while we are parsing it). There is no way to invoke / retrieve a portion of XML - we have only what is comming back from events.

 

Simple example:

We have a stock.xml file and we have Stock.class that contains only few data elements from the XML. We need to extract data as a List of Stock java objects.

 

The XML:

 



  
    PIPE
    Smoking pipe
    10.90
    10
  
  
    VIO
    Violin
    99.99
    5
  
  
    Hat
    9.49
  


 

The class:

 

package dp.test.xml.jaxp.sax.entity;

import java.math.BigDecimal;

/**
 * Example entity.
 * 
 * @author DPavlov
 */
public class Stock
{

	private String symbol;
	private BigDecimal quantity;
	
	public String getSymbol() {
		return symbol;
	}
	
	public void setSymbol(String symbol) {
		this.symbol = symbol;
	}
	
	public BigDecimal getQuantity() {
		return quantity;
	}
	
	public void setQuantity(BigDecimal quantity) {
		this.quantity = quantity;
	}

	public String toString() {
	    final String TAB = "    ";
	    
	    String retValue = "";
	    
	    retValue = "Stock ( "
	        + super.toString() + TAB
	        + "symbol = " + this.symbol + TAB
	        + "quantity = " + this.quantity + TAB
	        + " )";
	
	    return retValue;
	}
	
}

 

The code to do the task:

package dp.test.xml.jaxp.sax;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.junit.Test;
import org.xml.sax.SAXException;

import dp.test.xml.jaxp.sax.entity.Stock;

/**
 * JAXP SAX (Simple API for XML) is in fact the foundation of all java XML (even DOM
 * uses it behind the scenes). The use of SAX directly however can give you some benefits
 * in situations where you know exactly what data you will need to extract from the XML.
 * SAX is event driven, meaning it provides mechanism of reacting to event of encountering
 * nodes while the parsing of document is in progress. No data is stored in memory unless
 * you write the code to do so. The advantage is less memory consumption and maybe a little 
 * bit gain in speed since you can do a quick bail out switches in your code for the stuff 
 * you do not need. The cost is that you only have one go at this thing (i.e. it all happens 
 * while it happens), and you cannot manipulate the XML with this.
 * 
 * @author DPavlov
 */
public class JAXPSAXExample
{
	
	@Test
	public void testSAXNoNamespace() throws ParserConfigurationException, SAXException, IOException {
		
		SAXParserFactory factory = SAXParserFactory.newInstance();
		factory.setValidating(true);
		factory.setNamespaceAware(false); // This setting is very important - it influences the node name
		SAXParser parser = factory.newSAXParser();
		
		final List result = new ArrayList();
		parser.parse(JAXPSAXExample.class.getResourceAsStream("stock.xml"), new SAXHandlerNoNamespace(result));
		
		System.out.println(result);
		
	}
	
}

package dp.test.xml.jaxp.sax;

import java.math.BigDecimal;
import java.util.Arrays;
import java.util.List;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import dp.test.xml.jaxp.sax.entity.Stock;

/**
 * Extending default handler allows to override the necessary methods that
 * are invoked during parsing.
 * 
 * @author DPavlov
 */
public class SAXHandlerNoNamespace extends DefaultHandler
{

	private final List stocks;

	private Stock currentStock;
	
	private boolean handleSymbol = false;
	private boolean handleQuantity = false;
	
	private boolean hasData = false;
	
	public SAXHandlerNoNamespace(List stocks) {
		super();
		this.stocks = stocks;
	}

	/**
	 * @param uri namespace full URI
	 * @param localName simple name without namespace
	 * @param name full name including namespace
	 */
	@Override
	public void endElement(String uri, String localName, String name) throws SAXException {
		if ("stock".equals(name)) {
			if (this.hasData) {
				this.stocks.add(this.currentStock);
			}
			this.currentStock = null;
			this.hasData = false;
		} else if ("symbol".equals(name)) {
			this.handleSymbol = false;
		} else if ("quantity".equals(name)) {
			this.handleQuantity = false;
		}
	}

	/**
	 * @param uri namespace full URI
	 * @param localName simple name without namespace
	 * @param name full name including namespace
	 * @param attributes attributes in the node if any
	 */
	@Override
	public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
		if ("stock".equals(name)) {
			this.currentStock = new Stock(); 
		} else if ("symbol".equals(name)) {
			this.handleSymbol = true;
		} else if ("quantity".equals(name)) {
			this.handleQuantity = true;
		}
	}

	/**
	 * Current text inside the node.
	 */
	@Override
	public void characters(char[] ch, int start, int length) throws SAXException {
		if (handleSymbol) {
			this.currentStock.setSymbol(String.valueOf(Arrays.copyOfRange(ch, start, start + length)));
			this.hasData = true;
		} else if (handleQuantity) {
			this.currentStock.setQuantity(new BigDecimal(String.valueOf(Arrays.copyOfRange(ch, start, start + length))));
			this.hasData = true;
		}
	}
	
	
	
}

 

The output:

[
  Stock ( dp.test.xml.jaxp.sax.entity.Stock@ee22f7    symbol = PIPE    quantity = 10     ), 
  Stock ( dp.test.xml.jaxp.sax.entity.Stock@39ab89    symbol = VIO    quantity = 5     )
]

There is also support for namespaces, which looks something like this in our modified example

The XML:

 



  
    PIPE
    Smoking pipe
    10.90
    10
  
  
    VIO
    Violin
    99.99
    5
  
  
    Hat
    9.49
  



The code to do the task:

public class JAXPSAXExample
{
	

	@Test
	public void testSAXWithNamespace() throws ParserConfigurationException, SAXException, IOException {
		
		SAXParserFactory factory = SAXParserFactory.newInstance();
		factory.setValidating(true);
		factory.setNamespaceAware(true); // This setting is very important - it influences the node name
		SAXParser parser = factory.newSAXParser();
		
		final List result = new ArrayList();
		parser.parse(JAXPSAXExample.class.getResourceAsStream("stock-ns.xml"), new SAXHandlerWithNamespace(result));
		
		System.out.println(result);
		
	}
	
}

package dp.test.xml.jaxp.sax;

import java.math.BigDecimal;
import java.util.Arrays;
import java.util.List;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

import dp.test.xml.jaxp.sax.entity.Stock;

/**
 * Extending default handler allows to override the necessary methods that
 * are invoked during parsing.
 * 
 * @author DPavlov
 */
public class SAXHandlerWithNamespace extends DefaultHandler
{

	private final List stocks;

	private Stock currentStock;
	
	private boolean handleSymbol = false;
	private boolean handleQuantity = false;
	
	private boolean hasData = false;
	
	public SAXHandlerWithNamespace(List stocks) {
		super();
		this.stocks = stocks;
	}

	/**
	 * @param uri namespace full URI
	 * @param localName simple name without namespace
	 * @param name full name including namespace
	 */
	@Override
	public void endElement(String uri, String localName, String name) throws SAXException {
		if ("stock".equals(localName)) {
			if (this.hasData) {
				this.stocks.add(this.currentStock);
			}
			this.currentStock = null;
			this.hasData = false;
		} else if ("symbol".equals(localName)) {
			this.handleSymbol = false;
		} else if ("quantity".equals(localName)) {
			this.handleQuantity = false;
		}
	}

	/**
	 * @param uri namespace full URI
	 * @param localName simple name without namespace
	 * @param name full name including namespace
	 * @param attributes attributes in the node if any
	 */
	@Override
	public void startElement(String uri, String localName, String name, Attributes attributes) throws SAXException {
		if ("stock".equals(localName)) {
			this.currentStock = new Stock();
		} else if ("symbol".equals(localName)) {
			this.handleSymbol = true;
		} else if ("quantity".equals(localName)) {
			this.handleQuantity = true;
		}
	}

	/**
	 * Current text inside the node.
	 */
	@Override
	public void characters(char[] ch, int start, int length) throws SAXException {
		if (handleSymbol) {
			this.currentStock.setSymbol(String.valueOf(Arrays.copyOfRange(ch, start, start + length)));
			this.hasData = true;
		} else if (handleQuantity) {
			this.currentStock.setQuantity(new BigDecimal(String.valueOf(Arrays.copyOfRange(ch, start, start + length))));
			this.hasData = true;
		}
	}
	
	
	
}

 

Summary:

  • The SAXParser instance is obtained through SAXParserFactory.
  • The namespace awareness is set through the factory, that influences how node names are passed as parameters to handler
  • We do our processing of data by using handler class that must extend the DefaultHandler and we override necessary method to add logic of how the XML data should be extracted.

 

 

This page was last updated on: 13/04/2012 11:08