XQuery/Splitting Files

Motivation
You have a single large XML document with many consistent records in it. You want to split it into many smaller documents so that each can be edited by a separate user. There are many good reasons to split large files up. Some have to do with how much data you want to load into an editor at a time or how you want to publish individual files to a remote site.

eXist and many other systems do versioning and keep date/time stamps for each file. Using smaller files these functions may be easier to do.

Method
We will create an XQuery that will iterate through all the records in the document. For each record we will use the XQuery function to store a document in a collection. The format of this function is:

Where:
 * $collection is a string that holds the path to the collection we will be storing the data for each record. For example '/db/test/data'
 * $filename is the name of the file. The name can either be derived from the data or it can be generated by a sequence counter in the split query.  For example 'Hello.xml" or "1.xml".
 * $data is the data we will be storing into the file

Sample Input XML Document
One way to get started is to put your data such as business terms into a spreadsheet and convert it to XML. The oXygen XML editor has some very good tools for converting spreadsheets to XML format. Make sure you put "Term" and "Definition" in the first row and use this row to define the element names.

Using A Sequence Counter for Artificial Keys
Sometimes there are not any elements in the importing record that can be used as a unique key or are not appropriate to use as an artificial key. In this case you will want to use a counter to create an XML document with a unique number in it. The sequence number generated is called an "artificial key" since it is not really related directly to any data elements in the record.

You can achieve this by adding an "at counter" to your for loop. To do this just add the string at $count after the for variable like the following

The store function can then use the $count variable to create a file name with this number:

Adding a ID to each item using the XQuery update Operator
Once you have inserted the data into a collection you will then want to assign each item a unique ID. This is called an artificial key since it is created by an artifical import process and is not related to data inside of the item. Artificial keys are usually assigned by the computer system that stores the data but not derived from the data.

You can also automatically add an ID to each item by doing the following:

for $item at $count in $items return update insert {$count} preceding $item/person-name

After this update the new ID element will be inserted before the person-name element:

It is a best practice to make sure that items do not already have an ID element.

for $item at $count in $items[not(id)] return update insert {$count} preceding $item/person-name

This prevents duplicate ids from being added if the script gets run twice. You can also modify this to start the count one higher then the largest id in a collection.

(: get the largest ID in the collection :) let $largest-id := max( collection($my-collection)/*/id/text ) let $offset := $largest-id + 1 for $item at $count in $items[not(id)] return update insert {$count + $offset} preceding $item/person-name