XQuery/TEI Document Timeline

Motivation
You want to create a timeline of the dates with a single TEI document.

Approach
TEI documents may include date elements in any of the sections of the document - in the meta-data, in the document publication details, in front and back matter as well as in the body of the text. Let's assume that we want a time line showing dates in the text body.

We will use the Simile Timeline Javascript API to create a browsable timeline in an HTML page.

Extracting timeline dates
TEI documents store dates in the date element in the following format:

March 16 or 1861

We will write an XQuery script that will extract all of the date elements in the body of a TEI document and generate a Simile Timeline.

Getting the dates
Dates are used throughout the sections of a TEI document, but we are most likely to be interested in dates in the body of the text.

let $dates := doc($tei-document)//tei:body//tei:date

For example:

Transforming to Simile events
We can then transform this sequence of date elements into the format that is needed by Simile.

Note that there are two path expressions in the above query. The first expression $date/@when extracts the when attribute of the date element. The second path expression $date/text extracts the body text of the date element, i.e. the text between the begin and end date tags:

13 August 1642

Sample XQuery to Extract Dates from TEI File
For example, here are the dates in the TEI document "The Discovery of New Zealand" by J. C. Beaglehole, produced by the New Zealand Electronic Text Centre

Execute

Discussion
so dates really need filtering using a suitable RegExp. One option is to check the date format with the "castable" XQuery function.
 * TEI dates are generally XML dates which are recognised by the Simile timeline API. However TEI supports the encoding of relative dates such as

Providing Context
We can enhance the timeline by providing some context for the date in the timeline bubble. One approach is to include some of the preceding and following text.

Each date node is part of a parent node, e.g. is a child node in We need to access the mixture of elements and text nodes on either side of the target date. For example, preceding this node are a text node ("Cook left.."), a date node and another text node ("He returned .."). Following the target date is the text node ("and remained ..."). We can select these nodes using the preceding-sibling and following-sibling axes:

A crude approach to construct a context string is to join the node strings and extract a suitable substring. The text after: and the text before: We can then create an XML fragment with the target date in bold:

Finally the element needs to be serialized and added to the event:

Execute

Improved Context
The context is extracted from the parent node without regard to word or sentence boundaries. Splitting on word boundaries would be better.

Similarly, the text before the target date:

Splitting on sentence boundaries would be even better. We can use the pattern '\. ' as the marker. This may not be entirely accurate but false positives will merely shorten the context. The ellipsis is not now needed. $scope now is the number of sentences on either side.

Similarly for the beforeString.

Execute

Discussion
In addition, each event could link into the full text of the document. (to do)

Generating an HTML page
Since the event stream is parameterised by the source document, the HTML page containing the timeline also needs to be parameterised, so we will generate it using another XQuery script.

Simile API
The definition of the timeline layout uses the SIMILE timeline Javascript API. To define the basic bands:

Note that the bands are set for YEAR and DECADE which are appropriate for historical texts. The function has two parameters: the source file and the start year.

The events are generated by a call to the transformation script in the previous section.

Timeline.loadXML("dates.xq?file="+file, function(xml, url) { eventSource.loadXML(xml, url); });

Setting the Start date
The start date is the earliest date in the sequence of dates. We can find this by ordering the dates using the order by clause and then selecting the first item in the sequence.

We can retrieve the Document title and author

Examples

 * Beaglehole Timeline
 * Buck
 * Dates in this encoding are confined to the Bibliography and are publication rather than subject events.

Discussion

 * Simile Timeline has a problem displaying many events on closely related dates, so not all events may appear on the timeline.

/Text Encoding Initiative