XML - Managing Data Exchange/XPath

Introduction
Throughout the previous chapters you have learned the basic concepts of XSL and how you must refer to nodes in an XML document when performing an XSL transformation. Up to this point you have been using a straightforward syntax for referring to nodes in an XML document. Although the syntax you have used so far has been XPath there are many more functions and capabilities that you will learn in this chapter. As you begin to comprehend how path language is used for referring to nodes in an XML document your understanding of XML as a tree structure will begin to fall into place. This chapter contains examples that demonstrate many of the common uses of XPath, but for the full XPath specification, see the latest version of the standard at:

http://www.w3.org/TR/xpath

XSL uses XPath heavily.

XPath
When you go to copy a file or ‘cd’ into a directory at a command prompt you often type something along the lines of ‘/home/darnell/’ to refer to folders. This enables you to change into or refer to folders throughout your computer’s file system. XML has a similar way of referring to elements in an XML document. This special syntax is called XPath, which is short for XML Path Language.

XPath is a language for finding information in an XML document. XPath is used to navigate through elements and attributes in an XML document.

XPath, although used for referring to nodes in an XML tree, is not itself written in XML. This was a wise choice on the part of the W3C, because trying to specify path information in XML would be a very cumbersome task. Any characters that form XML syntax would need to be escaped so that it is not confused with XML when being processed. XPath is also very succinct, allowing you to call upon nodes in the XML tree with a great degree of specificity without being unnecessarily verbose.

XML as a tree structure
The great benefit about XML is that the document itself describes the structure of data. If any of you have researched your family history, you have probably come across a family tree. At the top of the tree is some early ancestor and at the bottom of the tree are the latest children.

With a tree structure you can see which children belong to which parents, which grandchildren belong to which grandparents and many other relationships.

The neat thing about XML is that it also fits nicely into this tree structure, often referred to as an XML Tree.

Understanding node relationships
We will use the following example to demonstrate the different node relationships.


 * Parent
 * Each element and attribute has one parent.
 * The book element is the parent of the title, author, year, and price:


 * Children
 * Element nodes may have zero, one or more children.
 * The title, author, year, and price elements are all children of the book element:


 * Siblings
 * Nodes that have the same parent.
 * The title, author, year, and price elements are all siblings:


 * Ancestors
 * A node's parent, parent's parent, etc.
 * The ancestors of the title element are the book element and the bookstore element:


 * Descendants
 * A node's children, children's children, etc.
 * Descendants of the bookstore element are the book, title, author, year, and price elements:

Also, it is still useful in some ways to think of an XML file as simultaneously being a serialized file, like you would view it in an XML editor. This is so you can understand the concepts of preceding and following nodes. A node is said to precede another if the original node is before the other in document order. Likewise, a node follows another if it is after that node in document order. Ancestors and descendants are not considered to be either preceding or following a node. This concept will come in handy later when discussing the concept of an axis.

Abbreviated vs. Unabbreviated XPath syntax
XPath was created so that nodes can be referred to very succinctly, while retaining the ability to search on many options. Most uses of XPath will involve searching for child nodes, parent nodes, or attribute nodes of a particular node. Because these uses are so common, an abbreviated syntax can be used to refer to these commonly-searched nodes. Following is an XML document that simulates a tree (the type that has leaves and branches.) It will be used to demonstrate the different types of syntax.

Exhibit 9.2: tree. xml – Example XML page

Following are a few examples of XPath location paths in English, Abbreviated XPath, then Unabbreviated XPath.

Selection 1:

English: All &lt;leaf&gt; elements in this document that are children of &lt;smallBranch&gt; elements that are children of &lt;bigBranch&gt; elements, that are children of the trunk, which is a child of the root.

Abbreviated:  /trunk/bigBranch/smallBranch/leaf

Unabbreviated: /child::trunk/child::bigBranch/child::smallBranch/child::leaf

Selection 2:

English:	The &lt;bigBranch&gt; elements with ‘name’ attribute equal to ‘bb3,’ that are children of the trunk element, which is a child of the root.

Abbreviated:		/trunk/bigBranch[@name=’bb3’]

Unabbreviated:	/child::trunk/child::bigBranch[attribute::name=’bb3’]

Notice how we can specify which bigBranch objects we want by using a predicate in the previous example. This narrows the search down to only bigBranch nodes that satisfy the predicate. The predicate is the part of the XPath statement that is in square brackets. In this case, the predicate is asking for bigBranch nodes with their ‘name’ attribute set to ‘bb3’.

The last two examples assume we want to specify the path from the root. Let’s now assume that we are specifying the path from a  node.

Selection 3:

English:The parent node of the current &lt;smallBranch&gt;. (Notice that this selection is relative to a &lt;smallBranch&gt;)

Abbreviated:  ..

Unabbreviated: parent::node

When using the Unabbreviated Syntax, you may notice that you are calling a parent or child followed by two colons. Each of those are called an axis. You will learn more about axes shortly.

Also, this may be a good time to explain the concept of a location path. A location path is the series of location steps taken to reach the node/nodes being selected. Location steps are the parts of XPath statements separated by / characters. They are one step on the way to finding the nodes you would like to select.

Location steps are comprised of three parts: an axis (child, parents, descendant, etc.), a node test (name of a node, or a function that retrieves one or more nodes), and a series of predicates (tests on the retrieved nodes that narrow the results, eliminating nodes that do not pass the predicate’s test).

So, in a location path, each of its location steps returns a node-list. If there are further steps on the path after a location step, the next step is executed on all the nodes returned by that step.

Relative vs. Absolute paths
When specifying a path with XPath, there are times when you will already be ‘in’ a node. But other times, you will want to select nodes starting from the root node. XPath lets you do both. If you have ever worked with websites in HTML, it works the same way as referring to other files in HTML hyperlinks. In HTML, you can specify an Absolute Path for the hyperlink, describing where another page is with the server name, folders, and filename all in the URL. Or, if you are referring to another file on the same site, you need not enter the server name or all of the path information. This is called a Relative Path. The concept can be applied similarly in XPath.

You can tell the difference by whether there is a ‘/’ character at the beginning of the XPath expression. If so, the path is being specified from the root, which makes it an Absolute Path. But if there is no ‘/’ at the beginning of the path, you are specifying a Relative Path, which describes where the other nodes are relative to the context node, or the node for which the next step is being taken.

Below is an XSL stylesheet (Exhibit 9.3) for use with our tree.xml file above (Exhibit 9.2).

Exhibit 9.3: xsl_tree.xsl – Example of both a relative and absolute path

Four types of XPath location paths
In the last two sections you learned about two different distinctions to separate out different location paths: Unabbreviated vs. Abbreviated and Relative vs. Absolute. Combining these two concepts could be helpful when talking about XPath location paths. Not to mention, it could make you sound really smart in front of your friends when you say things like:
 * Abbreviated Relative Location Paths- Use of abbreviated syntax while specifying a relative path.
 * Abbreviated Absolute Location Paths- Use of abbreviated syntax while specifying a absolute path.
 * Unabbreviated Relative Location Paths- Use of unabbreviated syntax while specifying a relative path.
 * Unabbreviated Absolute Location Paths- Use of unabbreviated syntax while specifying a absolute path.

I only mention this four-way distinction now because it could come in handy while reading the specification, or other texts on the subject.

XPath axes
In XPath, there are some node selections whose performance requires the Unabbreviated Syntax. In this case, you will be using an axis to specify each location step on your way through the location path.

From any node in the tree, there are 13 axes along which you can step. They are as follows:

XPath predicates and functions
Sometimes, you may want to use a predicate in an XPath Location Path to further filter your selection. Normally, you would get a set of nodes from a location path. A predicate is a small expression that gets evaluated for each node in a set of nodes. If the expression evaluates to ‘false’, then the node is not included in the selection. An example is as follows:

//p[@class=‘alert’]

In the preceding example, every &lt;p&gt; tag in the document is checked to see if its ‘class’ attribute is set to ‘alert’. Only those &lt;p&gt; tags with a ‘class’ attribute with value ‘alert’ are included in the set of nodes for this location path.

The following example uses a function, which can be used in a predicate to get information about the context node.

/book/chapter[position=3]

This previous example selects only the chapter of the book in the third position. So, for something to be returned, the current &lt;book&gt; element must have at least 3 &lt;chapter&gt; elements.

Also notice that the position function returns an integer. There are many functions in the XPath specification. For a complete list, see the W3C specification at http://www.w3.org/TR/xpath#corelib

Here are a few more functions that may be helpful:

number last – last node in the current node set

number position – position of the context node being tested

number count(node-set) – the number of nodes in a node-set

boolean starts-with(string, string) – returns true if the first argument starts with the second

boolean contains(string, string) – returns true if the first argument contains the second

number sum(node-set) – the sum of the numeric values of the nodes in the node-set

number floor(number) – the number, rounded down to the nearest integer

number ceiling(number) – the number, rounded up to the nearest integer

number round(number) – the number, rounded to the nearest integer

Example
The following XML document, XSD schemas, and XSL stylesheet examples are to help you put everything you have learned in this chapter together using real life data. As you study this example you will notice how XPath can be used in the stylesheet to call and modify the output of specific information from the document.

Below is an XML document (Exhibit 9.4)

Exhibit 9.4: movies_xpath.xml

Below is the second XML document (Exhibit 9.5)

Exhibit 9.5: cites__xpath.xml

Below is the Movies schema (Exhibit 9.6)

Exhibit 9.6: movies.xsd

Below is the Cities schema (Exhibit 9.7)

Exhibit 9.7: cities.xsd

Below is the XSL stylesheet (Exhibit 9.8) Exhibit 9.6: movies.xsl

Answers
Programmation XML/XPath XML/XPath