org.idoox.xml
Class TokenizerWrapper

java.lang.Object
  extended byorg.idoox.xml.TokenizerWrapper
All Implemented Interfaces:
Tokenizer

public class TokenizerWrapper
extends java.lang.Object
implements Tokenizer

This class helps you to wrap Tokenizers. If you want to process XML messages in stream-based manner you need to wrap Tokenizers. With help of this class wrapping of Tokenizers is really simple. To wrap Tokenizer you have to subclass this class. Usually it is sufficient to override only next() but you can override any other method. Here is an example of Tokenizer that leaves the first element from the wrapped Tokenizer.


 //overriden next() method -
 public byte next() throws TokenizerException, IOException {
   // ask TokenizerWrapper for the next Token
   byte next = super.next();
   switch (next) {
       // check the start Tokens
       case START_TOKEN:
           // if it is first Element, leave it and increment depth
           if (depth == 0) {
               depth++;
               super.next();
           }
           depth++;
           break;
       case END_TOKEN:
           depth--;
           // if it is last Element, leave it and decrement depth
           if (depth == 1) {
               super.next();
           }
           break;
   }
   return next;
 }

Since:
4.6
Component:
Core

Nested Class Summary
static class TokenizerWrapper.DefaultTokenizerState
          This is default implementation of internal tokenizers state.
static interface TokenizerWrapper.TokenizerState
          This interface represents internal state of TokenizerWrapper.
 
Field Summary
 
Fields inherited from interface org.idoox.xml.Tokenizer
CONTENT, END_DOCUMENT, END_TOKEN, START_TOKEN, typeNames, UNKNOWN
 
Constructor Summary
TokenizerWrapper(Tokenizer tokenizer)
          Creates new wrapper above the specified Tokenizer
 
Method Summary
 byte currentState()
          Returns the current state of the tokenizer.
 java.util.Map getCurrentPrefixMap()
          Returns clone of the current prefix map.
protected  TokenizerWrapper.TokenizerState getCurrentState()
          Returns the current state of the Tokenizer.
 org.w3c.dom.Element getDOMRepresentation(org.w3c.dom.Document doc)
          Returns DOM representation of the element that is being parsed.
static org.w3c.dom.Element getDOMRepresentation(Tokenizer tokenizer, org.w3c.dom.Document doc)
           
 java.lang.String getLocalName()
          Returns the local name of the current element.
 java.lang.String getNamespace()
          Returns the namespace URI of the current element.
 java.lang.String getNamespaceForPrefix(java.lang.String prefix)
          Returns a namespace URI for a declared prefix.
protected  Tokenizer getTokenizer()
          Gets underlying Tokenizer.
 byte next()
          Clears current state of tokenizer wrapper and calls Tokenizer.next() on underlying tokenizer.
static byte nextElement(Tokenizer tokenizer)
          Moves Tokenizer to next element - start or end (START_TOKEN, END_TOKEN or END_DOCUMENT).
static byte nextSibling(Tokenizer source)
          Moves Tokenizer to next sibling (START_TOKEN).
 QName parseQName(java.lang.String qName)
          Parses qName in the context of the opened element and returns the pair (namespaceURI, localName).
 int pushNewlyDeclaredPrefixes(DeclaredPrefixesStack prefixes)
          Adds prefixes newly declared in this token.
 java.lang.String readContent()
          Reads the content (PCDATA, CDATA).
 void readToken(Token stoken)
          Reads the start/end of an element.
protected  void setCurrentContent(java.lang.String content)
          Set current state of tokenizer wrapper.
protected  void setCurrentState(TokenizerWrapper.TokenizerState state)
          Sets the current state of the Tokenizer.
protected  void setCurrentToken(Token currentToken)
          Set current state of tokenizer wrapper.
protected  void setCurrentToken(Token currentToken, java.util.Map prefixMap, java.lang.String[] newPrefixes)
          Set current state of tokenizer wrapper.
protected  void setTokenizer(Tokenizer tokenizer)
          Sets underlying Tokenizer.
static java.lang.String tokenToString(byte token)
           
 boolean whitespaceContent()
          Returns true if the content contains only whitespaces.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TokenizerWrapper

public TokenizerWrapper(Tokenizer tokenizer)
Creates new wrapper above the specified Tokenizer

Parameters:
tokenizer - Tokenizer that will be wrapped
Method Detail

getCurrentState

protected TokenizerWrapper.TokenizerState getCurrentState()
Returns the current state of the Tokenizer. If the state is NULL TokenizerWrapper will forward all the functions to the underlying Tokenizer. If it is not NULL it will be used to implement functions inherited from Tokenizer.

Returns:
current state of TokenizerWrapper

setCurrentState

protected void setCurrentState(TokenizerWrapper.TokenizerState state)
Sets the current state of the Tokenizer. If the state is NULL TokenizerWrapper will forward all the functions to the underlying Tokenizer. If it is not NULL it will be used to implement functions inherited from Tokenizer.

Parameters:
state - new current state of TokenizerWrapper

setCurrentToken

protected void setCurrentToken(Token currentToken)
Set current state of tokenizer wrapper. The same can be done by
 setCurrentState(new DefaultTokenizerState(currentToken));
 

Parameters:
currentToken - token
See Also:
DefaultTokenizerState#DefaultTokenizerState

setCurrentContent

protected void setCurrentContent(java.lang.String content)
Set current state of tokenizer wrapper. The same can be done by
 setCurrentState(new DefaultTokenizerState(content));
 

Parameters:
content - content
See Also:
DefaultTokenizerState#DefaultTokenizerState

setCurrentToken

protected void setCurrentToken(Token currentToken,
                               java.util.Map prefixMap,
                               java.lang.String[] newPrefixes)
Set current state of tokenizer wrapper. The same can be done by
 setCurrentState(new DefaultTokenizerState(currentToken,prefixMap,newPrefixes));
 

Parameters:
currentToken - token
prefixMap - namespace declarations
newPrefixes - newly declared prefixes
See Also:
DefaultTokenizerState#DefaultTokenizerState

getTokenizer

protected Tokenizer getTokenizer()
Gets underlying Tokenizer.

Returns:
underlying Tokenizer.

setTokenizer

protected void setTokenizer(Tokenizer tokenizer)
Sets underlying Tokenizer.

Parameters:
tokenizer - new underlying Tokenizer.

getLocalName

public java.lang.String getLocalName()
                              throws TokenizerException
Description copied from interface: Tokenizer
Returns the local name of the current element. This function may be called only if the tokenizer is on START_TOKEN.

Specified by:
getLocalName in interface Tokenizer
Returns:
the local name of the current element
Throws:
TokenizerException - if the tokenizer is not on the START_TOKEN

getNamespace

public java.lang.String getNamespace()
                              throws TokenizerException
Description copied from interface: Tokenizer
Returns the namespace URI of the current element. This function may be called only if the tokenizer is on START_TOKEN.

Specified by:
getNamespace in interface Tokenizer
Returns:
the namespace URI of the current element
Throws:
TokenizerException - if the tokenizer is not on the START_TOKEN

getNamespaceForPrefix

public java.lang.String getNamespaceForPrefix(java.lang.String prefix)
Description copied from interface: Tokenizer
Returns a namespace URI for a declared prefix.

Specified by:
getNamespaceForPrefix in interface Tokenizer
Parameters:
prefix - the declared prefix
Returns:
the namespace URI for the prefix or null if the prefix has not been declared

getDOMRepresentation

public org.w3c.dom.Element getDOMRepresentation(org.w3c.dom.Document doc)
                                         throws TokenizerException
Description copied from interface: Tokenizer
Returns DOM representation of the element that is being parsed. Might be called only at the beginning of an element.

Specified by:
getDOMRepresentation in interface Tokenizer
Parameters:
doc - the document within which the element should be created
Returns:
element containing DOM representation of the XML tag being parsed
Throws:
TokenizerException - if there is an error in tokenizing the XML document

getDOMRepresentation

public static org.w3c.dom.Element getDOMRepresentation(Tokenizer tokenizer,
                                                       org.w3c.dom.Document doc)
                                                throws TokenizerException
Throws:
TokenizerException

parseQName

public QName parseQName(java.lang.String qName)
Description copied from interface: Tokenizer
Parses qName in the context of the opened element and returns the pair (namespaceURI, localName).

Specified by:
parseQName in interface Tokenizer
Parameters:
qName - the qualified name
Returns:
the resolved (expanded) name

next

public byte next()
          throws TokenizerException,
                 java.io.IOException
Clears current state of tokenizer wrapper and calls Tokenizer.next() on underlying tokenizer. You should override this function to change underlying tokenizer.

Specified by:
next in interface Tokenizer
Returns:
type of next token
Throws:
TokenizerException - if there is an error in the XML document or the tokenizer is beyond the end of the document (the previous call to next() has returned END_DOCUMENT).
java.io.IOException - if some IOException has occured

nextElement

public static byte nextElement(Tokenizer tokenizer)
                        throws java.io.IOException,
                               TokenizerException
Moves Tokenizer to next element - start or end (START_TOKEN, END_TOKEN or END_DOCUMENT). Any whitespace is skipped.

Parameters:
tokenizer - Tokenizer to move
Returns:
a state of the Tokenizer
Throws:
java.io.IOException
TokenizerException

nextSibling

public static byte nextSibling(Tokenizer source)
                        throws TokenizerException,
                               java.io.IOException,
                               java.lang.IllegalStateException
Moves Tokenizer to next sibling (START_TOKEN). If no sibling is found Tokenizer is moved to end of parent element (END_TOKEN or END_DOCUMENT). If Tokenizer is not on the start of an element (START_TOKEN) IllegalStateException is thrown.

Parameters:
source -
Returns:
a state of the Tokenizer
Throws:
TokenizerException
java.io.IOException
java.lang.IllegalStateException - If Tokenizer is not on the start of an element (START_TOKEN) IllegalStateException is thrown.

currentState

public byte currentState()
Description copied from interface: Tokenizer
Returns the current state of the tokenizer. See Tokenizer.next() for details.

Specified by:
currentState in interface Tokenizer
Returns:
the current token

readContent

public java.lang.String readContent()
                             throws TokenizerException
Description copied from interface: Tokenizer
Reads the content (PCDATA, CDATA).

Specified by:
readContent in interface Tokenizer
Returns:
the content
Throws:
TokenizerException - if the tokenizer is not in CONTENT state

readToken

public void readToken(Token stoken)
               throws TokenizerException,
                      java.io.IOException
Description copied from interface: Tokenizer
Reads the start/end of an element. Result is stored in the out parameter stoken. This function may be called only if the tokenizer is on START_TOKEN or END_TOKEN.

Specified by:
readToken in interface Tokenizer
Parameters:
stoken - structure containing name, namespace URI and attribute pairs; holder for result
Throws:
java.io.IOException - if there was an error reading the input document
TokenizerException - if the tokenizer is not in START_TOKEN, END_TOKEN or the document is not a well-formed XML.

whitespaceContent

public boolean whitespaceContent()
                          throws TokenizerException
Description copied from interface: Tokenizer
Returns true if the content contains only whitespaces.

Specified by:
whitespaceContent in interface Tokenizer
Returns:
true if the content contains only whitespaces
Throws:
TokenizerException - if the tokenizer is not in CONTENT state

getCurrentPrefixMap

public java.util.Map getCurrentPrefixMap()
Description copied from interface: Tokenizer
Returns clone of the current prefix map. This function might be used for example for implementing caching or multi-pass tokenizers.

Specified by:
getCurrentPrefixMap in interface Tokenizer
Returns:
the current prefix to namespace map

pushNewlyDeclaredPrefixes

public int pushNewlyDeclaredPrefixes(DeclaredPrefixesStack prefixes)
Description copied from interface: Tokenizer
Adds prefixes newly declared in this token. May be called only when the tokenizer is in START_TOKEN state.

Specified by:
pushNewlyDeclaredPrefixes in interface Tokenizer
Parameters:
prefixes - the newly declared prefixes
Returns:
the number of prefixes added

tokenToString

public static java.lang.String tokenToString(byte token)