XQuery/Using Intermediate Documents

Processing XML often involves the creation of intermediate XML fragments for subsequent processing. Here is an example of two approaches, one using multiple passes on the same data, the other a constructed intermediate view of the data.

MusicXML
MusicXML is an XML application for recording music scores. There is a range of software which produces and consumes MusicXML.

There are two styles of MusicXML with two related schemas, one in which measures are within parts (partwise), the other in which parts are within measures (timewise).

An example of a MusicXML partwise score is Mozart's Piano Sonata in A Major, K. 331

Here is a sample definition of a note:

Notes Range
The Recordare site has some sample code to demonstrate the use of XQuery to process MusicXML. The first script finds the lowest and highest notes in the score. The script shown on the site is not conformant to the current XQuery standard, but a few minor changes brings it up-to-date.

With output:

execute

Ancestor access
The path to the measure in which a note is located

uses a fixed set of steps back up the hierarchy. This limits the application of this script to one type of MusicXML schema because the position of the measure in the hierarchy is different in the two schemas. When the script was written, the ancestor axis was not supported but it is now, so those lines are more generally expressible as:

Note-to-midi
The function to convert notes to midi numbers uses nested if-then-else expressions. XQuery lacks a switch expression which might be used but a clearer approach would be to use a lookup-table, defined either locally in the script or stored in the database.

Here a sequence of notes is created as a look-up table. This is bound to a global variable which is used in a revised note-to-midi function:

Intermediate XML
The original script required repeated access to the original MusicXML source. An alternative approach would be to create an intermediate structure to hold the midi notes and use this in subsequent analysis. This structure is a computed view of the original notes augmented with derived data - the midi note and the measure.

and this view is then used to locate the high and low notes and their position in the score:

Revised script
execute

Discussion
Although arguably a cleaner, more direct design, the second script relies on the construction of temporary XML nodes which are then the subject of XPath expressions. These temporary XML nodes are handled differently in different implementations. In older versions of eXist each is written to a temporary document in the database which creates a performance overhead and problems of garbage collection. In the 1.3 release, intermediate XML nodes remain in memory, resulting in a major performance improvement.

There is, however, another problem with this approach. The size of the intermediate node may exceed pre-set, but configurable, limits on the size of constructed nodes.