org.idoox.xml
Interface Tokenizer

All Known Implementing Classes:
TokenizerWrapper, XMLWriterReader

public interface Tokenizer

Tokenizes stream containing XML into XML tokens. In other words, using tokenizer XML document is represented as sequece of XML tokens (elements).

Since:
4.0
Component:
Core

Field Summary
static byte CONTENT
          text data token
static byte END_DOCUMENT
          end of document reached token
static byte END_TOKEN
          element end token
static byte START_TOKEN
          element start token
static java.lang.String[] typeNames
          just for debugging; index this by token type to get type name
static byte UNKNOWN
          unknown token
 
Method Summary
 byte currentState()
          Returns the current state of the tokenizer.
 java.util.Map getCurrentPrefixMap()
          Returns clone of the current prefix map.
 org.w3c.dom.Element getDOMRepresentation(org.w3c.dom.Document doc)
          Returns DOM representation of the element that is being parsed.
 java.lang.String getLocalName()
          Returns the local name of the current element.
 java.lang.String getNamespace()
          Returns the namespace URI of the current element.
 java.lang.String getNamespaceForPrefix(java.lang.String prefix)
          Returns a namespace URI for a declared prefix.
 byte next()
          Parses next part of the input XML document and returns the state of the tokenizer (one of UNKNOWN, START_TOKEN, END_TOKEN, CONTENT, END_DOCUMENT).
 QName parseQName(java.lang.String qName)
          Parses qName in the context of the opened element and returns the pair (namespaceURI, localName).
 int pushNewlyDeclaredPrefixes(DeclaredPrefixesStack prefixes)
          Adds prefixes newly declared in this token.
 java.lang.String readContent()
          Reads the content (PCDATA, CDATA).
 void readToken(Token stoken)
          Reads the start/end of an element.
 boolean whitespaceContent()
          Returns true if the content contains only whitespaces.
 

Field Detail

UNKNOWN

public static final byte UNKNOWN
unknown token

See Also:
Constant Field Values

END_DOCUMENT

public static final byte END_DOCUMENT
end of document reached token

See Also:
Constant Field Values

START_TOKEN

public static final byte START_TOKEN
element start token

See Also:
Constant Field Values

END_TOKEN

public static final byte END_TOKEN
element end token

See Also:
Constant Field Values

CONTENT

public static final byte CONTENT
text data token

See Also:
Constant Field Values

typeNames

public static final java.lang.String[] typeNames
just for debugging; index this by token type to get type name

Method Detail

getLocalName

public java.lang.String getLocalName()
                              throws TokenizerException
Returns the local name of the current element. This function may be called only if the tokenizer is on START_TOKEN.

Returns:
the local name of the current element
Throws:
TokenizerException - if the tokenizer is not on the START_TOKEN

getNamespace

public java.lang.String getNamespace()
                              throws TokenizerException
Returns the namespace URI of the current element. This function may be called only if the tokenizer is on START_TOKEN.

Returns:
the namespace URI of the current element
Throws:
TokenizerException - if the tokenizer is not on the START_TOKEN

getNamespaceForPrefix

public java.lang.String getNamespaceForPrefix(java.lang.String prefix)
Returns a namespace URI for a declared prefix.

Parameters:
prefix - the declared prefix
Returns:
the namespace URI for the prefix or null if the prefix has not been declared

getDOMRepresentation

public org.w3c.dom.Element getDOMRepresentation(org.w3c.dom.Document doc)
                                         throws TokenizerException
Returns DOM representation of the element that is being parsed. Might be called only at the beginning of an element.

Parameters:
doc - the document within which the element should be created
Returns:
element containing DOM representation of the XML tag being parsed
Throws:
TokenizerException - if there is an error in tokenizing the XML document

parseQName

public QName parseQName(java.lang.String qName)
Parses qName in the context of the opened element and returns the pair (namespaceURI, localName).

Parameters:
qName - the qualified name
Returns:
the resolved (expanded) name

next

public byte next()
          throws TokenizerException,
                 java.io.IOException
Parses next part of the input XML document and returns the state of the tokenizer (one of UNKNOWN, START_TOKEN, END_TOKEN, CONTENT, END_DOCUMENT).

Returns:
the state of the tokenizer
Throws:
TokenizerException - if there is an error in the XML document or the tokenizer is beyond the end of the document (the previous call to next() has returned END_DOCUMENT).
java.io.IOException - if some IOException has occured

currentState

public byte currentState()
Returns the current state of the tokenizer. See next() for details.

Returns:
the current token

readContent

public java.lang.String readContent()
                             throws TokenizerException
Reads the content (PCDATA, CDATA).

Returns:
the content
Throws:
TokenizerException - if the tokenizer is not in CONTENT state

readToken

public void readToken(Token stoken)
               throws TokenizerException,
                      java.io.IOException
Reads the start/end of an element. Result is stored in the out parameter stoken. This function may be called only if the tokenizer is on START_TOKEN or END_TOKEN.

Parameters:
stoken - structure containing name, namespace URI and attribute pairs; holder for result
Throws:
TokenizerException - if the tokenizer is not in START_TOKEN, END_TOKEN or the document is not a well-formed XML.
java.io.IOException - if there was an error reading the input document

whitespaceContent

public boolean whitespaceContent()
                          throws TokenizerException
Returns true if the content contains only whitespaces.

Returns:
true if the content contains only whitespaces
Throws:
TokenizerException - if the tokenizer is not in CONTENT state

getCurrentPrefixMap

public java.util.Map getCurrentPrefixMap()
Returns clone of the current prefix map. This function might be used for example for implementing caching or multi-pass tokenizers.

Returns:
the current prefix to namespace map

pushNewlyDeclaredPrefixes

public int pushNewlyDeclaredPrefixes(DeclaredPrefixesStack prefixes)
Adds prefixes newly declared in this token. May be called only when the tokenizer is in START_TOKEN state.

Parameters:
prefixes - the newly declared prefixes
Returns:
the number of prefixes added
Throws:
java.lang.IllegalStateException - if tokenizer is not in START_TOKEN state.