[All Packages]  [Previous]

W3C SAX APIs

SAX is a standard interface for event-based XML parsing, developed collaboratively by the members of the XML-DEV mailing list.

There are two major types of XML (or SGML) APIs:

  1. tree-based APIs, and
  2. event-based APIs.

A tree-based API compiles an XML document into an internal tree structure, then allows an application to navigate that tree using the Document Object Model (DOM), a standard tree-based API for XML and HTML documents.

An event-based API, on the other hand, reports parsing events (such as the start and end of elements) directly to the application through callbacks, and does not usually build an internal tree. The application implements handlers to deal with the different events, much like handling events in a graphical user interface.

Tree-based APIs are useful for a wide range of applications, but they often put a great strain on system resources, especially if the document is large (under very controlled circumstances, it is possible to construct the tree in a lazy fashion to avoid some of this problem). Furthermore, some applications need to build their own, different data trees, and it is very inefficient to build a tree of parse nodes, only to map it onto a new tree.

In both of these cases, an event-based API provides a simpler, lower-level access to an XML document: you can parse documents much larger than your available system memory, and you can construct your own data structures using your callback event handlers.

To use SAX, an xmlsaxcb structure is initialized with function pointers and passed to the xmlinit() call. A pointer to a user-defined context structure may also be included; that context pointer will be passed to each SAX function.

The SAX callback structure:

typedef struct
{
   sword (*startDocument)(void *ctx);
   sword (*endDocument)(void *ctx);
   sword (*startElement)(void *ctx, const oratext *name, const struct xmlarray *attrs);
   sword (*endElement)(void *ctx, const oratext *name);
   sword (*characters)(void *ctx, const oratext *ch, size_t len);
   sword (*ignorableWhitespace)(void *ctx, const oratext *ch, size_t len);
   sword (*processingInstruction)(void *ctx, const oratext *target, const oratext *data);
   sword (*notationDecl)(void *ctx, const oratext *name,
                         const oratext *publicId, const oratext *systemId);
   sword (*unparsedEntityDecl)(void *ctx, const oratext *name, const oratext *publicId,
                               const oratext *systemId, const oratext *notationName);
   sword (*nsStartElement)(void *ctx, const oratext *qname,
                           const oratext *local, const oratext *nsp,
	                   const struct xmlnodes *attrs);
} xmlsaxcb;


Data Structures and Types

 o ORATEXT
 o SWORD
 o XMLATTRS

Callback Functions conforming to the SAX standard

 o characters(void *ctx, const oratext *ch, size_t len)
Receive notification of character data inside an element.
 o endDocument(void *ctx)
Receive notification of the end of the document.
 o endElement(void *ctx, const oratext *name)
Receive notification of the end of an element.
 o ignorableWhitespace(void *ctx, const oratext *ch, size_t len)
Receive notification of ignorable whitespace in element content.
 o notationDecl(void *ctx, const oratext *name, const oratext *publicId, const oratext *systemId)
Receive notification of a notation declaration.
 o processingInstruction(void *ctx, const oratext *target, const oratext *data)
Receive notification of a processing instruction.
 o startDocument(void *ctx)
Receive notification of the beginning of the document.
 o startElement(void *ctx, const oratext *name, const struct xmlattrs *attrs)
Receive notification of the start of an element.
 o unparsedEntityDecl(void *ctx, const oratext *name, const oratext *publicId, const oratext *systemId,
const oratext *notationName)
Receive notification of an unparsed entity declaration.

Non-SAX Callback Functions

 o nsStartElement(void *ctx, const oratext *qname, const oratext *local, const oratext *namespace, const struct xmlattrs *attrs)
Receive notification of the start of a namespace for an element.

Data Structure and Type Description

 o ORATEXT
typedef unsigned char oratext;
 o SWORD
typedef signed int sword;
 o XMLATTRS
typedef struct xmlattrs xmlattrs;

Note: The contents of xmlattrs are private and must not be accessed by users.


Function Prototypes

 o characters

PURPOSE

This callback function receives notification of character data inside an element.

SYNTAX

 sword (*characters)(void *ctx, const oratext *ch, size_t len);

PARAMETERS

ctx (IN) - client context pointer

ch (IN) - the characters

len (IN) - number of characters to use from the character pointer

COMMENTS


 o endDocument

PURPOSE

This callback function receives notification of the end of the document.

SYNTAX

 sword (*endDocument)(void *ctx); 

PARAMETERS

ctx (IN) - client context

COMMENTS


 o endElement

PURPOSE

This callback function receives notification of the end of an element.

SYNTAX

 sword (*endElement)(void *ctx, const oratext *name); 

PARAMETERS

ctx (IN) - client context

name (IN) - element type name

COMMENTS


 o ignorableWhitespace

PURPOSE

This callback function receives notification of ignorable whitespace in element content.

SYNTAX

 sword (*ignorableWhitespace)(void *ctx, const oratext *ch, size_t len); 

PARAMETERS

ctx (IN) - client context

ch (IN) - whitespace characters

len (IN) - number of characters to use from the character pointer

COMMENTS


 o notationDecl

PURPOSE

This callback function receives notification of a notation declaration.

SYNTAX

 sword (*notationDecl)(void *ctx, const oratext *name, const oratext *publicId, const oratext *systemId); 

PARAMETERS

ctx (IN) - client context

name (IN) - notation name

publicId (IN) - notation public identifier, or null if not available

systemId (IN) - notation system identifier

COMMENTS


 o processingInstruction

PURPOSE

This callback function receives notification of a processing instruction.

SYNTAX

 sword (*processingInstruction)(void *ctx, const oratext *target, const oratext *data); 

PARAMETERS

ctx (IN) - client context

target (IN) - processing instruction target

data (IN) - processing instruction data, or null if none is supplied

COMMENTS


 o startDocument

PURPOSE

This callback function receives notification of the beginning of the document.

SYNTAX

 sword (*startDocument)(void *ctx); 

PARAMETERS

ctx (IN) - client context

COMMENTS


 o startElement

PURPOSE

This callback function receives notification of the beginning of an element.

SYNTAX

 sword (*startElement)(void *ctx, const oratext *name, const struct xmlattrs *attrs); 

PARAMETERS

ctx (IN) - client context

name (IN) - element type name

attrs (IN) - specified or defaulted attributes

COMMENTS


 o unparsedEntityDecl

PURPOSE

This callback function receives notification of an unparsed entity declaration.

SYNTAX

 sword (*unparsedEntityDecl)(void *ctx, const oratext *name, const oratext *publicId, const oratext *systemId, 
         const oratext *notationName); 

PARAMETERS

ctx (IN) - client context

name (IN) - entity name

publicId (IN) - entity public identifier, or null if not available

systemId (IN) - entity system identifier

notationName (IN) - name of the associated notation

COMMENTS


 o nsStartElement

PURPOSE

This callback function receives notification of the start of a namespace for an element.

SYNTAX

 sword (*nsStartElement)(void *ctx, const oratext *qname, const oratext *local, const oratext *namespace, 
         const struct xmlattrs *attrs)); 

PARAMETERS

ctx (IN) - client context

qname (IN) - element fully qualified name

local (IN) - element local name

namespace (IN) - element namespace (URI)

attrs (IN) - specified or defaulted attributes

COMMENTS