The DOM Level 2 Iterator, Filter, and TreeWalker interfaces extend the functionality of the DOM to allow simple and efficient traversal of document subtrees, node lists, or the results of queries.
This proposal contains Iterator, Filter, and TreeWalker interfaces, but no query interfaces. In the future, it is likely that a separate specification will be prepared for query interfaces, which may be query-language independent.
An iterator allows the nodes of a data structure to be returned sequentially. When an iterator is first created, calling nextNode() returns the first node. When no more nodes are present, nextNode() returns a null. Since DOM structures may change as a document is loaded, if nextNode() finds no more nodes, it is still quite possible that further nodes may be added in the next instant. An iterator may be active while the data structure it navigates is being edited, so an iterator must behave gracefully in the face of change. Additions and deletions in the underlying data structure do not invalidate an iterator; in fact, an iterator is never invalidated.
Using ordered set semantics, the position of the iterator is determined by the relative position in the ordered set. There is no current node. When an iterator is created for a list, the position is set before the first element:
->A B C D E F G H I
Each call to next() returns a node and advances the position. For instance, if we start with the above position, the first call to next() returns "A" and advances the iterator:
A ->B C D E F G H I
The relative position of the iterator remains valid when other nodes are added or deleted. For instance, if "A" is deleted, the position of the iterator is unchanged with respect to the remaining nodes:
->B C D E F G H I
Similarly, if the node ahead of the iterator is deleted or moved, the iterator "slides forward". Therefore, if "B" is deleted, the position remains unchanged with respect to the remaining nodes:
->C D E F G H I
For the same reason, moving the "C" node to the end of the set does not change the current position with respect to the remaining nodes:
->D E F G H I C
When nodes are added as children of the node to the left of the iterator, there is some difference of opinion as to what constitutes the most consistent behavior. Suppose the iterator is advanced past "D" using next(), then a new node is added as a child of "D" in the original tree. Since children of a node occur after the node in document order, one approach is to have the new child appear after the node, but before the current position of the iterator:
D a ->E F G H I C
The newly inserted node "a" occurs before the iterator, so it will not be encountered when the iterator is moved forward. This is convenient when an iterator is being used to add nodes to the tree, since the programmer does not need to skip over the newly inserted nodes. In this case, if the iterator were moved backward, this new node would be the first one encountered. A second consistent approach is to say that nodes added as children of the node to the left of the iterator appear after the current position of the iterator:
D ->a E F G H I C
Using this approach, the newly added nodes are encountered as the iterator moves forward. We believe either approach is justifiable, and we have not decided which of the two approaches is best.
Note that the relative position of the iterator is not the same as the absolute position within the set. The position of the iterator is relative to the node before it and the node after it, which is why the position floats gracefully when nodes are deleted or inserted before or after the position of the iterator.
Iterators are created using the createNodeIterator method found in Document. When an iterator is created, flags can be used to determine which nodes will be "visible" and which nodes will be "invisible" while traversing the tree. Nodes that are "invisible" are skipped over by the iterator as though they did not exist. These flags can be combined using OR:
NodeIterator iter=document.createNodeIterator(root, SHOW_ELEMENT | SHOW_PROCESSING_INSTRUCTION | SHOW_COMMENT | SHOW_ENTITY_REFERENCE, NULL);
Filters allow the user to "filter out" nodes. Each filter contains a user-written function that looks at a node and determines whether or not it should be filtered out. To use a filter, you create an iterator that uses the filter. The iterator applies the filter to each node, and if the filter rejects the node, the iterator skips over the node as though it were not present in the document. Filters are easy to write, since they need not know how to navigate the structure on which they operate, and they can be reused for different kinds of iterators that operate on different data structures.
Consider a filter that finds the named anchors in an HTML document. In HTML, an HREF can refer to any <A> element that has a NAME attribute. Here is a filter that looks at a node and determines whether it is a named anchor:
class NamedAnchorFilter implements NodeFilter { short acceptNode(Node n) { if (n instanceof Element) { Element e = n; if (e.getNodeName() != "A") return FILTER_SKIP; if (e.getNodeNameAttributeNode("NAME") != NULL) { return FILTER_ACCEPT; } } return FILTER_SKIP; } }
To use this filter, the user would create an instance of the filter and create an iterator using it:
NamedAnchorFilter myFilter; NodeIterator iter=document.creatNodeIterator(node, SHOW_ELEMENT, myFilter);
If SHOW_ENTITY_REFERENCE is not set, entities are expanded. If SHOW_ENTITY_REFERENCE is set,
entity references will be encountered by the iterator. There is no setting that
shows both the entity reference and its expansion.
The TreeWalker interface provides many of the same benefits as the Iterator interface. The main difference between these two interfaces is that the TreeWalker presents a tree-oriented view of the nodes in a subtree, and an Iterator presents a list-oriented view. In other words, an Iterator allows you to move forward or back, but a TreeWalker allows you to move to the parent of a node, to one of its children, or to a sibling.
Using a TreeWalker is quite similar to navigation using the Node directly, and the navigation methods for the two interfaces are analogous. For instance, here is a function that processes the nodes of a subtree in document order using the Node navigation methods:
processMe(Node n) { doSomething(n); if (n.firstChild != null) { processMe(n.firstChild); } if (n.nextSibling != null) { processMe(n.nextSibling); } }
Here is the code to do the same thing using a TreeWalker:
processMe(TreeWalker tw) { doSomething(tw.current()); if (tw.firstChild() != null) { processMe(tw); } if (tw.nextSibling() != null) { processMe(tw); } tw.parent(); }
The main difference between these two functions is that the TreeWalker version must take into account the fact that changing the internal position of the TreeWalker will also affect any calling function that continues to use the TreeWalker. Therefore, a function that uses a TreeWalker should be careful about the position after the function is finished.
The advantage of using a TreeWalker instead of direct Node navigation is that the TreeWalker allows the user to choose an appropriate view of the tree. Flags may be used to show or hide comments or processing instructions, entities may be expanded or left as entity references, and sequences of text nodes may be merged into a single virtual text node. In addition, Filters may be used to present a custom view of the tree. Suppose a program needs a view of a document that shows which tables occur in each chapter, listed by chapter. In this view, only the chapter elements and the tables that they contain are seen. The first step is to write an appropriate filter:
class TablesInChapters implements NodeFilter { short acceptNode(Node n) { if (n instanceof Element) { Element e = n; if (e.nodeName == "CHAPTER") return FILTER_ACCEPT; if (e.nodeName == "TABLE") return FILTER_ACCEPT; if (e.nodeName == "SECT1" || e.nodeName == "SECT2" || e.nodeName == "SECT3" || e.nodeName == "SECT4" || e.nodeName == "SECT5" || e.nodeName == "SECT6" || e.nodeName == "SECT7") return FILTER_SKIP; } return FILTER_REJECT; } }
Now the program can create an instance of this Filter, create a TreeWalker that uses it, and pass this TreeWalker to our ProcessMe() function:
TablesInChapters tablesInChapters; TreeWalker tw(root, SHOW_ELEMENT, TablesInChapters); ProcessMe(tw);
Without making any changes to the above ProcessMe() function, it now
processes only the <CHAPTER> and <TABLE> elements. The programmer can write
other filters or set other flags to choose different sets of nodes; if
functions use TreeWalker to navigate, they will support any view of the
document defined with a TreeWalker.
NodeIterators are used to step through a set of nodes, e.g. the set of nodes in a NodeList, the document subtree governed by a particular node, the results of a query, or any other set of nodes. The set of nodes to be iterated is determined by the factory that creates the iterator.
Any iterator that returns nodes may implement the NodeIterator interface. Users and vendor libraries may also choose to create iterators that implement the NodeIterator interface.
interface NodeIterator { readonly attribute long whatToShow; // Constants for whatToShow const unsigned long SHOW_ALL = 0xFFFF; const unsigned long SHOW_ELEMENT = 0x00000001; const unsigned long SHOW_ATTRIBUTE = 0x00000002; const unsigned long SHOW_TEXT = 0x00000004; const unsigned long SHOW_CDATA_SECTION = 0x00000008; const unsigned long SHOW_ENTITY_REFERENCE = 0x00000010; const unsigned long SHOW_ENTITY = 0x00000020; const unsigned long SHOW_PROCESSING_INSTRUCTION = 0x00000040; const unsigned long SHOW_COMMENT = 0x00000080; const unsigned long SHOW_DOCUMENT = 0x00000100; const unsigned long SHOW_DOCUMENT_TYPE = 0x00000200; const unsigned long SHOW_DOCUMENT_FRAGMENT = 0x00000400; const unsigned long SHOW_NOTATION = 0x00000800; readonly attribute NodeFilter filter; Node nextNode(); Node previousNode(); };
whatToShow
These are the available values for the whatToShow parameter. They are the same as the set of possible types for Node, and their values are derived by using a bit position corresponding to the value of NodeType for the equivalent node type.
SHOW_ALL |
Show all nodes. |
SHOW_ELEMENT |
Show element nodes. |
SHOW_ATTRIBUTE |
Show attribute nodes. This is meaningful only when creating an iterator with an attribute node as its root; in this case, it means that the attribute node will appear in the first position of the iteration. Since attributes are not part of the document tree, they do not appear when iterating over the document tree. |
SHOW_TEXT |
Show text nodes. |
SHOW_CDATA_SECTION |
Show CDATASection nodes. |
SHOW_ENTITY_REFERENCE |
Show Entity Reference nodes. |
SHOW_ENTITY |
Show Entity nodes. This currently has no effect. |
SHOW_PROCESSING_INSTRUCTION |
Show ProcessingInstruction nodes. |
SHOW_COMMENT |
Show Comment nodes. |
SHOW_DOCUMENT |
Show Document nodes. |
SHOW_DOCUMENT_TYPE |
Show DocumentType nodes. |
SHOW_DOCUMENT_FRAGMENT |
Show DocumentFragment nodes. |
SHOW_NOTATION |
Show Notation nodes. This currently has no effect. |
filter
nextNode
Node
in the set being iterated over, or NULL if there are no more members in that
set.
previousNode
Node
in the set being iterated over, or NULL if there are no more members in that
set.
Filters are simply objects that know how to "filter out" nodes. If an iterator is given a filter, before it returns the next node, it applies the filter. If the filter says to accept the node, the iterator returns it; otherwise, the iterator looks for the next node and pretends that the node that was rejected was not there.
The DOM does not provide any filters. Filter is just an interface that users can implement to provide their own filters. The introduction to this chapter gives an example of how a user can implement a filter to perform a specific function.
Filters do not need to know how to iterate, nor do they need to know anything about the data structure that is being iterated. This makes it very easy to write filters, since the only thing they have to know how to do is evaluate a single node. One filter may be used with a number of different kinds of iterators, encouraging code reuse.
If a filter is installed for a TreeWalker or Iterator, the system may use that filter for various tasks, especially during fix-up. Filters should make no assumptions about how frequently they will be called.
interface NodeFilter { // Constants returned by acceptNode const short FILTER_ACCEPT = 1; const short FILTER_REJECT = 2; const short FILTER_SKIP = 3; short acceptNode(in Node n); };
The following constants are returned by the acceptNode() method:
FILTER_ACCEPT |
Accept the node. Navigation methods defined for Iterator or TreeWalker will return this node. |
FILTER_REJECT |
Reject the node. Navigation methods defined for Iterator or TreeWalker will not return this node. For TreeWalker, the children of this node will also be rejected. Iterators treat this as a synonym for FILTER_SKIP. |
FILTER_SKIP |
Reject the node. Navigation methods defined for Iterator or TreeWalker will not return this node. For both Iterator and Treewalker, the children of this node will still be considered. |
acceptNode
n |
The node to check to see if it passes the filter or not. |
TreeWalkers are used to navigate a document tree or subtree using the view of the document defined by its whatToShow flags and any filters that are defined for the TreeWalker. Any function which performs navigation using a TreeWalker will automatically support any view defined by a TreeWalker.
interface TreeWalker { readonly attribute long whatToShow; // Constants for whatToShow const unsigned long SHOW_ALL = 0xFFFF; const unsigned long SHOW_ELEMENT = 0x00000001; const unsigned long SHOW_ATTRIBUTE = 0x00000002; const unsigned long SHOW_TEXT = 0x00000004; const unsigned long SHOW_CDATA_SECTION = 0x00000008; const unsigned long SHOW_ENTITY_REFERENCE = 0x00000010; const unsigned long SHOW_ENTITY = 0x00000020; const unsigned long SHOW_PROCESSING_INSTRUCTION = 0x00000040; const unsigned long SHOW_COMMENT = 0x00000080; const unsigned long SHOW_DOCUMENT = 0x00000100; const unsigned long SHOW_DOCUMENT_TYPE = 0x00000200; const unsigned long SHOW_DOCUMENT_FRAGMENT = 0x00000400; const unsigned long SHOW_NOTATION = 0x00000800; readonly attribute NodeFilter filter; Node current(); Node parentNode(); Node firstChild(); Node lastChild(); Node previousSibling(); Node nextSibling(); };
whatToShow
These are the available values for the whatToShow parameter. They are the same as the set of possible types for Node, and their values are derived by using a bit position corresponding to the value of NodeType for the equivalent node type.
SHOW_ALL |
Show all nodes. |
SHOW_ELEMENT |
Show element nodes. |
SHOW_ATTRIBUTE |
Show attribute nodes. This is meaningful only when creating an TreeWalker with an attribute node as its root; in this case, it means that the attribute node will appear in the first position of the iteration. Since attributes are not part of the document tree, they do not appear when iterating over the document tree. |
SHOW_TEXT |
Show text nodes. |
SHOW_CDATA_SECTION |
Show CDATASection nodes. |
SHOW_ENTITY_REFERENCE |
Show Entity Reference nodes. |
SHOW_ENTITY |
Show Entity nodes. This currently has no effect. |
SHOW_PROCESSING_INSTRUCTION |
Show ProcessingInstruction nodes. |
SHOW_COMMENT |
Show Comment nodes. |
SHOW_DOCUMENT |
Show Document nodes. |
SHOW_DOCUMENT_TYPE |
Show DocumentType nodes. |
SHOW_DOCUMENT_FRAGMENT |
Show DocumentFragment nodes. |
SHOW_NOTATION |
Show Notation nodes. This currently has no effect. |
filter
current
parentNode
firstChild
lastChild
previousSibling
nextSibling
Document contains methods that creates iterators to traverse a node and its children in document order (depth first, pre-order traversal, which is equivalent to the order in which the start tags occur in the text representation of the document).
interface DocumentIF { short createNodeIterator(in Node root, in short whatToShow, in NodeFilter filter); };
createNodeIterator
root |
The node which will be iterated together with its children. | |
whatToShow |
This flag determines whether entities are expanded, and whether comments, processing instructions, or text are presented via the iterator. See the description of Iterator for the set of possible values. These flags can be combined using OR: NodeIterator iter=doc.createNodeIterator(root, SHOW_ELEMENT | SHOW_PROCESSING_INSTRUCTION | SHOW_COMMENT | SHOW_ENTITY_REFERENCE, myFilter); If SHOW_ENTITY_REFERENCE is not set, entities are expanded. If SHOW_ENTITY_REFERENCE is set, entity references will be encountered by the iterator. There is no setting that shows both the entity reference and its expansion. (ED: Several people have suggested that the functionality of whatToShow be
implemented using filters. We feel that it is better to implement them using
iterators, since it makes it possible to provide a more efficient
implementation. A filter must examine each node individually; an iterator can
make use of internal data structures to examine only those nodes that are
desired.)
| |
filter |