XQuery/Typeswitch Transformations

Motivation
You have an XML document that you want to transform into a different format of XML. You want to control and customize the transformation process, and you want a modular way to store the transformation rules so that you or others can easily modify and maintain them.

Background on using XQuery vs. XSLT for Document Transformation
You may have heard the conventional wisdom that "XQuery is best for querying or selecting XML, and XSLT is best for transforming it." In reality, both methods are capable of transforming XML. Despite XSLT's somewhat longer history and larger install base, the "XQuery typeswitch" method of transforming XML provides numerous advantages. These are covered in more detail in XQuery Benefits.

Method
We will use XQuery's typeswitch expression to transform an XML document from one form into another. The basic approach is simple and straightforward: For each XML node in the input document, we will specify what should be created in the output document. The typeswitch expression performs this core function of identifying what happens to each node in the source document. We will write an XQuery function that takes a node, tests it using a typeswitch expression, and dispatches that node to the appropriate handler function, which transforms the node into the new format and sends any child elements back to the main function using the passthru function. This recursive routine effectively crawls through an entire node and its children, transforming them into the target format. Once the structure has been set up, the transform is easy to modify, even if there is very complex nesting of the tags within the input document. (The tail recursion technique will be familiar to discerning users of XSLT, but there is absolutely no XSLT prerequisite for this article.)

Example Data
Suppose you have a simple XML document that you would like to transform:

Sample Output Document
Here is the format that you would like to turn the source input into:

Options for Typeswitch Transforms
There are two important options when you are creating a typeswitch transform. One choice is if you are using a single node parameter or if you are using a sequence of nodes as your parameter.

The second important option is what you want your default action to be. The default can be configured to pass-thru or remove all elements that are not matched by your typeswitch statement.

Example Transformation With Typeswitch
The most effective way to use the typeswitch expression to transform XML is to create a series of XQuery functions. In this way, we can cleanly separate the major actions of the transformation into modular functions. (In fact, the library of functions can be saved into an XQuery library module, which can then be reused by other XQueries.) The "magic" of this typeswitch-style transformation is that once you understand the basic pattern and structure of the functions, you can adapt them to your own data. You'll find that the structure is so modular and straightforward that it's even possible to teach others the basics of the pattern in a short period of time and empower them to maintain and update the transformation rules themselves.

The first function in our module is where the typeswitch expression is located. This function is conventionally called the "dispatch" function:

Notice that the typeswitch expression tests the input node against a list of criteria: is the node a text node, a comment node, a bill element, or a betitle element, or a section-id element, etc? If it's a text node (e.g. "This is the Bill title"), we simply return the text, unmodified. (Note that the text node test comes first since text is likely to be the single most plentiful node type in a text-rich document, and placing the most common type first improves performance.) If instead the node is a bill element, then we pass the node to the aptly-named local:bill function for bill-specific handling. The local:bill function (see below) turns the element into a  element. It then passes the contents of the bill element to the local:passthru function. If our node doesn't match any of the pre-defined rules, then the typeswitch expressions resorts to the required final "default" (think: "fallback") statement; this default is used for all nodes that don't match any of the preceding tests. In our example, the default expression sends nodes without matches to the local:passthru function. (Typeswitch isn't limited to matching text and element nodes; it can also match other the node types: processor-instruction and comment, but not typically attribute. Attributes are conventionally dealt with inside the handler function of the attribute's parent element, rather than in the core typeswitch function.)

The Passthru Function
The passthru function recurses through a given node's children, handing each of them back to the main typeswitch operation. (*Note: This is such a simple function that it may appear extraneous. Why not simply replace instances of local:passthru($node) with local:dispatch($node/node)?  Its primary benefit is that it simplifies the code, relieving you of the burden of typing an extra "/node" for each recursion.  A secondary benefit is that it introduces the possibility of filtering a node before it is sent to the typeswitch routine.)

The Alternative Passthru Function
The above local:passthru function will remove all attributes from your nodes. If you have attributes in your input XML which you would like to retain, use the following passthru function as an alternative.

Execute the transformation
We can now write a query that takes the source XML and uses the local:dispatch function to transform the input into the target format:

Execute

Compact approach
While the above approach is recommended as the most modular, extensible approach, it is perfectly acceptable to express the same transformation using a more compact, self-contained function: Besides the fact that this function is entirely self-contained (beginning with a FLWOR expression and using $node/node to recurse through child nodes), notice that the function uses computed element constructors to accomplish the transformation.

Conclusion
This is the heart of the XQuery Typeswitch approach to XML document transformation. On the basis of this simple pattern, entire libraries have been written to transform source formats like TEI, DocBook, and Office OpenXML documents into other formats like XHTML, XSL-FO, and each other.

While we can create typeswitch modules by hand, building them up element by element, we can also use XQuery to generate a skeleton typeswitch module; see this article's companion article, XQuery/Generating_Skeleton_Typeswitch_Transformation_Modules. In addition to the "skeleton generator", this article also provides examples of more complex transformation patterns with XQuery typeswitch: changing an element's name, ignoring an element, transforming differently based on the context of the element, reordering elements. It also provides a detailed comparison of XQuery and XSLT's approaches to the same example transformation, so it is useful for readers coming from the world of XSLT.