XML - Managing Data Exchange/XSLT and Style Sheets

In previous chapters, we have introduced the basics of using an XSL stylesheet to convert XML documents into HTML. This chapter will briefly review those concepts and introduce many new ones as well. It is a reference for creating stylesheets.

XML Stylesheets
The eXtensible Stylesheet Language (XSL) provides a means to transform and format the contents of XML document for display. It includes two parts, XSL Transformation (XSLT) for transforming the XML document, and XSLFO (XSL Formatting Objects) for formatting or applying styles to XML documents. The XSL Transformation Language (XSLT) is used to transform XML documents from one form to another, including new XML documents, HTML, XHTML, and text documents. XSL-FO can create PDF documents, as well as other output formats, from XML. With XSLT you can effectively recycle content, redesigning it for use in new documents, or changing it to fit limitless uses. For example, from a single XML source file, you could extract a document ready for print, one for the Web, one for a Unix manual page, and another for an online help system. You can also choose to extract only parts of a document written in a specific language from an XML source that stores text in many languages. The possibilities are endless!

An XSLT stylesheet is an XML document, complete with elements and attributes. It has two kinds of elements, top-level and instruction. Top-level elements fall directly under the  root element. Instruction elements represent a set of formatting instructions that dictate how the contents of an XML document will be transformed. During the transformation process, XSLT analyzes the XML document, or the source tree, and converts it into a node tree, a hierarchical representation of the entire XML document, also known as the result tree. Each node represents a piece of the XML document, such as an element, attribute or some text content. The XSL stylesheet contains predefined “templates” that contain instructions on what to do with the nodes. XSLT will use the  attribute to relate XML element nodes to the templates, and transform them into the result document.

Let's review the stylesheet, city.xsl from chapter 2, and examine it in a little more detail:

Exhibit 1: XML stylesheet for city entity

 Since a stylesheet is an XML document, it begins with the XML declaration. This includes the pseudo-attributes  and. They are called pseudo because they are not the same as element attributes. The standalone attribute allows you to directly specify an external DTD The  tag declares the start of the stylesheet and identifies the version number and the official W3C namespace. Notice the conventional prefix for the XSLT namespace, xsl. Once a prefix is declared, it must be used for all the elements. The &lt;xsl:output&gt; tag is an optional element that determines how to output the result tree. The  element defines the start of a template and contains rules to apply when a specified node is matched. The  attribute is used to  (match) the template with an XMLelement, in this case the root (/), or whole branch, of the XML source document. If no output method has been specified, the output would default to HTML in this case since the root element is the start tag The  element is an empty element since it has no character content. It applies a template rule to the current element or the element's child nodes. The  attribute contains a location path telling it which element's content to process. The instruction element  extracts the string value of the child of the selected node, in this case, the text node child of   

The  element defines the rules that implement a change. This can be any number of things, including a simple plain-text conversion, the addition or removal of XML elements, or simply a conversion to HTML, when the pattern is matched. The pattern, defined in the element’s  attribute, contains an abbreviated ../XPath/ location path. This is basically the name of the root element in the doc, in our case, "tourGuide."

When transforming an XML document into HTML, the processor expects that elements in the stylesheet be well-formed, just as with XML. This means that all elements must have an end tag. For example, it is not unusual to see the  tag alone. The XSLT processor requires that an element with a start-tag must close with an end tag. With the  element, this means either using   or   As mentioned in Chapter 3, the   element is an empty element. That means it carries no content between tags, but it may have attributes. Although no end tags are output for the HTML output, they still must have end-tags in the stylesheet. For instance, in the stylesheet, you will list:  or as an empty element. The HTML output will drop the end-tag so it looks like this:  On a side note, the processor will recognize html tags no matter what case they are in - BODY, body, Body are all interpreted the same.

Output
XSLT can be used to transform an XML source into many different types of documents. XHTML is also XML, if it is well formed, so it could also be used as the source or the result. However, transforming plain HTML into XML won't work unless it is first turned into XHTML so that it conforms to the XML 1.0 recommendation. Here is a list of all the possible type-to-type transformations performed by XSLT:

Exhibit 2: Type-To-Type Transformations

The  element in the stylesheet determines how to output the result tree. This element is optional, but it allows you to have more control over the output. If you do not include it, the output method will default to XML, or HTML if the first element in the result tree is the element. Exhibit 3 lists attributes.

Exhibit 3: Element output attributes <SMALL>(from Wiley: XSL Essentials by Michael Fitzgerald)</SMALL>

XML to XML
Since we have had a lot of practice transforming an XML document to HTML, we are going to transform city.xml, used in chapter 2, into another XML file, using host.xsd as the schema.

Exhibit 4: XML document for city entity

Exhibit 5: XSL document for city entity that list cities by City ID

Exhibit 6: XML schema for host city entity

Exhibit 7: XML stylesheet for city entity


 * Although the  is set to "xml", since there is no   element as the root of the result tree, it would default to XML output.
 * is a top-level element that creates a group of attributes by the name of "date." This attribute set can be reused throughout the stylesheet. The element  also has the attribute use-attribute-sets allowing you to chain together several sets of attributes.
 * The  produces the XML stylesheet processing instructions.
 * The element  creates a comment in the result tree
 * The  element allows you to add an attribute to an element that is created in the result tree.

The stylesheet produces this result tree:

Exhibit 8: XML result tree for city entity

The processor automatically inserts the XML declaration at the top of the result tree. The processing instruction, or PI, is an instruction intended for use by a processing application. In this case, the href points to a local stylesheet that will be applied to the XML document when it is processed. We used  to create new content in the result tree and added attributes to it.

There are two other instruction elements for inserting nodes into a result tree. These are  and. Unlike, which only copies content of the child node (like the child text node), these elements copy everything. The following code shows how the copy element can be used to copy the city element in city.xml:

Exhibit 9: Copy element

The result looks like this:

Exhibit 10: Copy element result

The output isn't very interesting, because copy does not pick up the child nodes, only the current node. In our example, it picks up the two city nodes that are in the city.xml file. The copy element has an optional attribute, use-attribute-sets, which allows you to add attributes to the element. However, it will leave behind any other attributes, except the namespace, if it is present. Here is the result if a namespace is declared in the source document, in this case, the default namespace:

Exhibit 11: Namespace result

If you want to copy more from the source file than just one node, the  element includes the current node, and any attribute nodes that are associated with it. This includes any nodes that might be laying around, such as namespace nodes, text nodes, and child element nodes. When we apply the  element to city.xml, the result is almost an exact replica of city.xml! You can also copy comments and processing instructions using  and  where   is the value of the name attribute in the processing instruction you wish to retrieve.

Why would this be useful, you ask? Sometimes you want to just grab nodes and go! For example, if you want to place a copy of city.xml into a SOAP envelope, you can easily do it using. If you don't already know, Simple Object Access Protocol, or SOAP, is a protocol for packaging XML documents for exchange. This is really useful in a B2B environment because it provides a standard way to package XML messages. You can read more about SOAP at www.w3.org/tr/soap.

Use an XML editor to create the above XML Stylesheets, and experiment with the  and   elements.

Templates
Since templates define the rules for changing nodes, it would make sense to reuse them, either in the same stylesheet or in other stylesheets. This can be accomplished by naming a template, and then calling it with a  element. Named templates from other stylesheets can also be included. You can quickly see how this is useful in practical applications. Here is an example using named templates:

Exhibit 110: Named templates

Templates also have a mode attribute. This allows you to process a node more than once, producing a different result each time, depending on the template. Let's create a stylesheet to practice modes.

Exhibit 12: XML template modes


 * tells the processor to look for a template that has the same mode attribute value
 * returns the current node which is converted to a string with Using  will also return the current node.

The result isn't very flattering since we didn't do much with the file, but it gets the point across.

Exhibit 13: Result from above stylesheet

By default, XSLT processors have built-in template rules. If you apply a stylesheet without any matching rules, and it fails to match a pattern, the default rules are automatically applied. The default rules output the content of all the elements.

Sorting
Writing “well formed” code XML is vital. At times, however, simply displaying information (the most elementary level of data management) is not all that is necessary to properly identify a project. As information technology specialists, it is necessary to fully understand that order is vital for interpretation. Order can be attained by putting data in a format that is quickly readable. Such information then becomes quickly usable. Using a comparative model or simply looking for a specific name or item becomes very easy. Finding a specific musical artist, title, or musical type becomes very easy. As an Information Specialist, you must fully be aware that it often becomes necessary to sort information. The basis of sorting in XMLT is the xsl:sort command. The xsl:sort element exemplifies a sort key component. A sort key component identifies how a sort key value is to be identified for each item in the order of information being sorted. A Sort Key Value is defined as “the value computed for an item by using the Nth sort key component” The significance of a sort key component is realized either by its select attribute, or by the contained sequence constructor. A Sequence Constructor is defined as a “sequence of zero or more sibling nodes in the stylesheet that can be evaluated to return a sequence of nodes and atomic values”. There are instances when neither is present. Under these circumstances, the default is select=".", which has the effect of sorting on the actual value of the item if it is an atomic value, or on the typed-value of the item if it is a node. If a select attribute is present, its value must be an Xpath expression.

The following is how the <xsl:sort> element is used to sort the output.

Sort Information is held as Follows: Sorting output in XML is quite easy and is done by adding the <xsl:sort> element after the <xsl:for-each> element in the XSL file.

Exhibit 14: Stylesheet with sort function

This example will sort the file alphabetically by artist name. Note: The select attribute indicates what XML element to sort on. Information can be SELECTED and SORTED by “title” or “artist”. These are categories that the XML document will display within the body of the file.

We have used the  function to sort the results of an   statement before. The sort element has many other uses as well. Essentially, it instructs the processor to sort nodes based on certain criteria, which is known as the sort key. It defaults to sorting the elements in ascending order. Here is a short list of the different attributes that sort takes:

Exhibit 15: Sort attributes

The sort element can be used in either the  or the   elements. It can also be used multiple times within a template, or in several templates, to create sub-ordering levels.

Numbering
The  instruction element allows you to insert numbers into your results. Combined with a sort element, you can easily create numbered lists. When this simple stylesheet, hotelNumbering.xsl, is applied to city_hotel.xml, we get the result listed below:

Exhibit 16: Sorting and numbering lists

Exhibit 17: Result hotelNumbering.xsl 1. Bull Frog Inn 2. Mandarin Oriental Kuala Lumpur 3. Pan Pacific Kuala Lumpur 4. Pook's Hill Lodge

The expression in  is evaluated and the value for   is based on the sorted node list. To improve the looks we are adding the format attribute with a linefeed character reference (&amp;#xa;), a zero digit to indicate that the number will be a zero digit to indicate that the number will be an integer type, and a period and space to make it look nicer. The format list can be based on the following sequences:

Exhibit 17: Numbering formats – Uppercase letters – Lowercase letters – Uppercase Roman numerals – Lowercase Roman numerals – Numeral prefix – Integer prefix/ hyphen prefix To specify different levels of numbering, such as sections and subsections of the source document, the  attribute is used, which tells the processor the levels of the source tree that should be considered. By default, it is set to, as seen in the example above. It also can take values of  and. The  attribute is a pattern that tells the processor which nodes to count (for numbering purposes). If it is not specified, it defaults to a pattern matching the same node type as the current node. The  attribute can also be used to specify the node where the counting should start.

When level is set to, the processor searches for nodes that match the value of  , and if it is not present, it matches the current node. When it finds the match, it creates a node-list and counts all the matching nodes of that type. If the  attribute is listed, it tells the processor where to start counting from, rather than counting all nodes

When the level is, it doesn't just count a list of one node type, it creates a list of all the nodes that are ancestors of the current node, in the actual order from the source document. After this list is created, it selects all the nodes that match the nodes represented in count. It then maps the number of preceding siblings for each node that matches count. In effect,  remembers all the nodes separately. This is where  is different. It will number all the elements sequentially, instead of counting them in multiple levels. As with the other two values, you can use the  attribute to tell the processor where to start counting from, which in effect will separate it into levels.

This is a modification of the example above using the :

Exhibit 18: Sorting and numbering lists

Exhibit 19: Result – hotelNumbering2.xsl 1.1 Bull Frog Inn 1.2 Pook's Hill Lodge 2.1 Pan Pacific Kuala Lumpur 2.2 Mandarin Oriental Kuala Lumpur

The first template matches the root node and then selects all  nodes that have   as an ancestor, creating a node-list. The next template recursively processes the  element, and gives it a number for each instance of   based on the number of elements in the attribute. This is figured out by counting the number of preceding siblings, plus 1.

Formatting
Formatting numbers is a simple process so this section will be a brief overview of what can be done. Placed within the XML stylesheet, functions can be used to manipulate data during the transformation. In order to make numbers a little easier to read, we need to be able to separate the digits into groups, or add commas or decimals. To do this we use the  function. The purpose of this function is to convert a numeric value into a string using specified patterns that control the number of leading zeroes, separator between thousands, etc. The basic syntax of this function is as follows: The following are the characters and their meanings used to represent the number format when using the format-number function within a stylesheet:
 * is a string that lays out the general representation of a number. Each character in the string represents either a digit from number or some special punctuation such as a comma or minus sign.
 * is a string that lays out the general representation of a number. Each character in the string represents either a digit from number or some special punctuation such as a comma or minus sign.

Exhibit 20: Format-number function Symbol             Meaning 0              A digit. #              A digit, zero shows as absent. . (period)     Placeholder for decimal separator. ,              Placeholder for grouping separator. ;              Separate formats. -              Default prefix for negative. %              Multiply by 100 and show as a percentage. X              Any other characters can be used in the prefix or suffix. ‘              Used to quote special characters in a prefix or suffix.

Conditional Processing
There are times when it is necessary to display output based on a condition. There are two instruction elements that let you conditionally determine which template will be used based on certain tests. These are the  and   elements.

The test condition for an  statement must be contained within the   attribute of the   element. Expressions that are testing greater than and less than operators must represent them by “&amp;gt;” and “&amp;lt;” respectively in order for the appropriate transformation to take place. The  function from XPath is a Boolean function and evaluates to true if its argument is false, and vice versa. The  and   conditions can be used to combine multiple tests, but an   statement can, at most, test only one expression. It can also only instantiate the use of one template.

The  element, is similar to the   statement in Java. By using the  element, the   element can offer a many alternative expressions. A choose element must contain at least one when statement, but it can have as many as it needs. The choose element can also contain one instance of the otherwise element, which works like the final else in a Java program. It contains the template if none of the other expressions are true.

The  element is another conditional processing element. We have used it in previous chapter exercises, so this will be a quick review. The  element is an instruction element, which means it must be children of template elements. evaluates to a node-set, based on the value of the select attribute, or expression, and processes through each node in document order, or sorted order.

Parameters and Variables
XSLT offers two similar elements,  and. Both have a required  attribute, and an optional   attribute, and you declare them like this:

Exhibit 21: Variable and parameter declaration <xsl:variable name="var1" select=" '' "/> <xsl:param name="par1" select=" '' "/>

The above declarations have bound to an empty string, which is the same effect as if you had left off the select attribute. With parameters, this value is considered only a default, or initial value to be changed either from the command line, or from another template using the  element. However, with the variable, as a general rule, the value is set and can't be changed dynamically except under special circumstances. When making declarations, remember that variables can be declared anywhere within a template, but a parameter must be declared at the beginning of the template.

Both elements can also have global and local scope, depending on where they are defined. If they are defined at the top-level under the &lt;stylesheet&gt; elements, they are global in scope and can be used anywhere in the stylesheet. If they are defined in a template, they are local and can only be used in that template. Variables and parameters declared in templates are visible only to the template they are declared in, and to templates underneath them. They have a cascading effect: they can spill down from the top-level into a template, down into a template within that one, etc, but they cannot go back up!

We are going to hard-code a value for the parameter in it's declaration element using the  attribute.

Exhibit 22: HTML results

The value that you pass in does not have to be enclosed in quotes, unless you are passing a value with more than one word. For example, we could have passed either country="United States" or country=Belize without getting an error.

The value of a variable can also be used to set an attribute value. Here is an example setting the countryName element with an attribute of countryCode equal to the value in the variable:

Exhibit 23: Attribute of countryCode

This is known as an attribute value template. Notice the use of braces around the parameter. This tells the processor to evaluate the content as an expression, which then converts the result to a string in the result tree. There are attributes which cannot be set with an attribute value template:
 * Attributes that contain patterns (such as  in  )
 * Attributes of top-level elements
 * Attributes that refer to named objects (such as the  attribute of  )

Parameters, though not variables, can be passed between templates using the  element. This element has two attributes,, which is required, and  , which is optional. This next example uses with-param as a child of the  element, although it can also be used as a child of.

Exhibit 24: XSL With-Param


 * Here we match the  nodes that were returned in the   node set.
 * , as discussed earlier, calls the template named
 * The element  tells the called template to use the parameter named , and the select statement sets the expression that will be evaluated.
 * Notice the declaration for the parameter is in the first line of the template. It instantiates  to an empty string, because the value will be replaced by the value of the expression in the   element's   attribute.
 * outputs a line feed in the result tree to make the output look nicer.

Exhibit 25: Text results – withParam.xsl City Name: Belmopan Number of hotels: 2 City Name: Kuala Lumpur Number of hotels: 2

The Muenchian Method
The Muenchian Method is a method developed by Steve Muench for performing functions using keys. Keys work by assigning a key value to a node and giving you access to that node through the key value. If there are lots of nodes that have the same key value, then all those nodes are retrieved when you use that key value. Effectively this means that if you want to group a set of nodes according to a particular property of the node, then you can use keys to group them together. One of the more common uses for the Muenchian method is grouping items and counting the number of occurrences in that group, such as number of occurrences of a city

Text Results – muenchianMethod.xsl City Name: Atlanta Number of Occurrences: 1 City Name: Athens Number of Occurrences: 1 City Name: Sydney Number of Occurrences: 1

Datatypes
There are five different datatypes in XSLT: Node-set, String, Number, Boolean, and Result tree fragment. Variables and parameters can be bound to each of these, but the last type is specific to them.

Node-sets are returned everywhere in XSLT. We've seen them returned from  and   elements, and variables. Now we will see how a variable can be bound to a node-set. Examine the following code:

Exhibit 26: Variable bound to a node-set

Here, we are setting the value of the variable  to the node-set   from the source tree. The  element is a child of city, so the output generated by apply-templates is the text node of. Remember, you can use variable references in expressions but not patterns. This means we cannot use the reference  as the value of a   attribute.

String types are useful if you are interested only in the text of nodes, rather than in the whole node-set. String types use XPath functions, most notably,. This is just a simple example:

Exhibit 27: String types

This is in fact, a longer way of saying:

Exhibit 28: Shorter version of above It is also possible to declare a variable that has a number value. You do this by using the XPath function.

Exhibit 29: Declaration of variable with number value

You can use numeric operators such as + - * / to perform mathematic operations on numbers, as well as some built in XPath functions such as  and.

The Boolean type has only two possible values, true or false. As an example, we are going to use a Boolean variable to test to see if a parameter has been passed into the stylesheet. Exhibit 30: Boolean variable to test

We start with an empty-string declaration for the parameter. In the  attribute of , the   function tests the value of. If the value is an empty string, as we defined by default,  evaluates to , and the template is not instantiated. If it does have a value, and it can be any value at all,  evaluates to.

The final datatype is the result tree fragment. Essentially it is a chunk of text (a string) that can contain markup. Let's look at an example before we dive into the details:

Exhibit 31: Result tree fragment datatype

Notice we didn't use the select attribute to define the variable. We aren't selecting a node and getting its value, rather we are creating arbitrary text. Instead, we declared it as the content of the element. The text in between the opening and closing variable tags is the actual fragment of the result tree. In general, if you use the select attribute as we did earlier, and don't specify content when declaring variables, the elements are empty elements. If you don't use select and you do specify content, the content is a result tree. You can perform operations on it as if it were a string, but unlike a node set, you can't use operators such as / or // to get to the nodes. The way you retrieve the content from the variable and get it into the result tree is by using the copy-of element. Let's see how we would do this:

Exhibit 32: Retrieve and place into result tree The result tree would now contain two elements: a copy of the city element and the added element, description.

EXSLT
EXSLT is a set of community developed extensions to XSLT. The modules include facilities to handle dates and times, math, and strings.

Multiple Stylesheets
In previous chapters, we have imported and used multiple XML and schema documents. It is also possible to use multiple stylesheets using the  and   elements, which should be familiar. It is also possible to process multiple XML documents at a time, in one stylesheet, by using the XSLT function.

Including an external stylesheet is very similar to what we have done in earlier chapters with schemas. The  element only has one attribute, which is. It is required and always contains a URI (Uniform Resource Identifier) reference to the location of the file, which can be local (in the same local directory) or remote. You can include as many stylesheets as you need, as long as they are at the top level. They can be scattered all over the stylesheet if you want, as long as they are children of the  element. When the processor encounters an instance of include, it replaces the instance with all the elements from the included document, including template rules and top-level elements, but not the root  element. All the items just become part of the stylesheet tree itself, and the processor treats them all the same. Here are declarations for including a local and remote stylesheet:

Exhibit 33: Declarations for local and remote stylesheet

Since  returns all the elements in the included stylesheet, you need to make sure that the stylesheet you are including does not include your own stylesheet. For example, city.xsl cannot include city_hotel.xsl, if city_hotel.xsl has an include element which includes city.xsl. When including multiple files, you need to make sure that you are not including another stylesheet multiple times. If city_hotel.xsl includes amenity.xsl, and country.xsl includes amenity.xsl, and city.xsl includes both city_hotel.xsl and country.xsl, it has indirectly included amenity.xsl twice. This could cause template rule duplication and errors. These are some confusing rules, but they are easy to avoid if you carefully examine the stylesheets before they are included.

The difference between importing stylesheets and including them is that the template rules imported each have a different import precedence, while included stylesheet templates are merged into one tree and processed normally. Imported templates form an import tree, complete with the root  element so the processor can track the order in which they were imported. Just like include, import has one attribute, href, which is required and should contain the URI reference for the document. It is also a top-level element and can be used as many times as need. However, it must be the immediate child for the  element, otherwise there will be errors. This code demonstrates importing a local stylesheet:

Exhibit 34: Importing local stylesheet

The order of the  elements dictates the precedence that matching templates will have over one another. Templates that are imported last have higher priority than those that are imported first. However, the  element also has a   attribute that can affect its priority. The higher the number in the  attribute, the higher the precedence. Import priority only comes into effect when templates collide, otherwise importing stylesheets is not that much different from including them. Another way to handle colliding templates is to use the  element. If a template in the imported document collides with a template in the importing document, apply-templates will override the rule and cause the imported template to be invoked.

The  function allows you to process additional XML documents and their nodes. The function is called from any attribute that uses an expression, such as the  attribute. For example:

Exhibit 35: Document function

When applied to an xml document that only contains an empty hotel element, such as the result tree will add a new element called amenityList, and place all the content from amenity.xml (except the XML declaration) in it. The document function can take many other parameters such as a remote URI, and a node-set, just to name a few. For more information on using, visit http://www.w3.org/TR/xslt#document

XSL-FO
XSL-FO stands for Extensible Stylesheet Language Formatting Objects and is a language for formatting XML data. When it was created, XSL was originally split into two parts, XSL and XSL-FO. Both parts are now formally named XSL. XSL-FO documents define a number of rectangular areas for displaying output. XSL-FO is used for the formatting of XML data for output to screen, paper or other media, such as PDF format. For more information, visit http://www.w3schools.com/xslfo/default.asp

Reference Section
Exhibit 36: XSL Elements <SMALL>(from http://www.w3schools.com/xsl/xsl_w3celementref.asp and http://www.w3.org/TR/xslt#element-syntax-summary)</SMALL>

Exhibit 37: XSLT Functions <SMALL>(from http://www.w3schools.com/xsl/xsl_functions.asp)</SMALL>

Exhibit 38: Inherited XPath Functions <SMALL>(from http://www.w3schools.com/xsl/xsl_functions.asp)</SMALL> Node Set Functions

String Functions

Number Functions

 Boolean Functions

Exercises
In order to learn more about XSL and stylesheets, exercises are provided.

Answers
In order to learn more about XSL and stylesheets, answers are provided.

Programmation XML/XSLT XML/XSLT e fogli di stile