Tokenizer (WSO2 SOA Enablement Server for Java 6.6 API)

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.idoox.xml
Interface Tokenizer

All Known Implementing Classes:: TokenizerWrapper, XMLWriterReader

public interface Tokenizer

Tokenizes stream containing XML into XML tokens. In other words, using tokenizer XML document is represented as sequece of XML tokens (elements).

Since:: 4.0
Component:: Core

Field Summary
`static byte`	`CONTENT` text data token
`static byte`	`END_DOCUMENT` end of document reached token
`static byte`	`END_TOKEN` element end token
`static byte`	`START_TOKEN` element start token
`static java.lang.String[]`	`typeNames` just for debugging; index this by token type to get type name
`static byte`	`UNKNOWN` unknown token

Method Summary
`byte`	`currentState()` Returns the current state of the tokenizer.
`java.util.Map`	`getCurrentPrefixMap()` Returns clone of the current prefix map.
`org.w3c.dom.Element`	`getDOMRepresentation(org.w3c.dom.Document doc)` Returns DOM representation of the element that is being parsed.
`java.lang.String`	`getLocalName()` Returns the local name of the current element.
`java.lang.String`	`getNamespace()` Returns the namespace URI of the current element.
`java.lang.String`	`getNamespaceForPrefix(java.lang.String prefix)` Returns a namespace URI for a declared prefix.
`byte`	`next()` Parses next part of the input XML document and returns the state of the tokenizer (one of UNKNOWN, START_TOKEN, END_TOKEN, CONTENT, END_DOCUMENT).
`QName`	`parseQName(java.lang.String qName)` Parses qName in the context of the opened element and returns the pair (namespaceURI, localName).
`int`	`pushNewlyDeclaredPrefixes(DeclaredPrefixesStack prefixes)` Adds prefixes newly declared in this token.
`java.lang.String`	`readContent()` Reads the content (PCDATA, CDATA).
`void`	`readToken(Token stoken)` Reads the start/end of an element.
`boolean`	`whitespaceContent()` Returns true if the content contains only whitespaces.

Field Detail

UNKNOWN

public static final byte UNKNOWN

unknown token

See Also:: Constant Field Values

END_DOCUMENT

public static final byte END_DOCUMENT

end of document reached token

See Also:: Constant Field Values

START_TOKEN

public static final byte START_TOKEN

element start token

See Also:: Constant Field Values

END_TOKEN

public static final byte END_TOKEN

element end token

See Also:: Constant Field Values

CONTENT

public static final byte CONTENT

text data token

See Also:: Constant Field Values

typeNames

public static final java.lang.String[] typeNames

just for debugging; index this by token type to get type name

Method Detail

getLocalName

public java.lang.String getLocalName()
                              throws TokenizerException

Returns the local name of the current element. This function may be called only if the tokenizer is on START_TOKEN.

Returns:: the local name of the current element
Throws:: TokenizerException - if the tokenizer is not on the START_TOKEN

getNamespace

public java.lang.String getNamespace()
                              throws TokenizerException

Returns the namespace URI of the current element. This function may be called only if the tokenizer is on START_TOKEN.

Returns:: the namespace URI of the current element
Throws:: TokenizerException - if the tokenizer is not on the START_TOKEN

getNamespaceForPrefix

public java.lang.String getNamespaceForPrefix(java.lang.String prefix)

Returns a namespace URI for a declared prefix.

Parameters:: prefix - the declared prefix
Returns:: the namespace URI for the prefix or null if the prefix has not been declared

getDOMRepresentation

public org.w3c.dom.Element getDOMRepresentation(org.w3c.dom.Document doc)
                                         throws TokenizerException

Returns DOM representation of the element that is being parsed. Might be called only at the beginning of an element.

Parameters:: doc - the document within which the element should be created
Returns:: element containing DOM representation of the XML tag being parsed
Throws:: TokenizerException - if there is an error in tokenizing the XML document

parseQName

public QName parseQName(java.lang.String qName)

Parses qName in the context of the opened element and returns the pair (namespaceURI, localName).

Parameters:: qName - the qualified name
Returns:: the resolved (expanded) name

public byte next()
          throws TokenizerException,
                 java.io.IOException

Parses next part of the input XML document and returns the state of the tokenizer (one of UNKNOWN, START_TOKEN, END_TOKEN, CONTENT, END_DOCUMENT).

Returns:: the state of the tokenizer
Throws:: TokenizerException - if there is an error in the XML document or the tokenizer is beyond the end of the document (the previous call to next() has returned END_DOCUMENT).; java.io.IOException - if some IOException has occured

currentState

public byte currentState()

Returns the current state of the tokenizer. See next() for details.

Returns:: the current token

readContent

public java.lang.String readContent()
                             throws TokenizerException

Reads the content (PCDATA, CDATA).

Returns:: the content
Throws:: TokenizerException - if the tokenizer is not in CONTENT state

readToken

public void readToken(Token stoken)
               throws TokenizerException,
                      java.io.IOException

Reads the start/end of an element. Result is stored in the out parameter stoken. This function may be called only if the tokenizer is on START_TOKEN or END_TOKEN.

Parameters:: stoken - structure containing name, namespace URI and attribute pairs; holder for result
Throws:: TokenizerException - if the tokenizer is not in START_TOKEN, END_TOKEN or the document is not a well-formed XML.; java.io.IOException - if there was an error reading the input document

whitespaceContent

public boolean whitespaceContent()
                          throws TokenizerException

Returns true if the content contains only whitespaces.

Returns:: true if the content contains only whitespaces
Throws:: TokenizerException - if the tokenizer is not in CONTENT state

getCurrentPrefixMap

public java.util.Map getCurrentPrefixMap()

Returns clone of the current prefix map. This function might be used for example for implementing caching or multi-pass tokenizers.

Returns:: the current prefix to namespace map

pushNewlyDeclaredPrefixes

public int pushNewlyDeclaredPrefixes(DeclaredPrefixesStack prefixes)

Adds prefixes newly declared in this token. May be called only when the tokenizer is in START_TOKEN state.

Parameters:: prefixes - the newly declared prefixes
Returns:: the number of prefixes added
Throws:: java.lang.IllegalStateException - if tokenizer is not in START_TOKEN state.