XQuery/DocBook to ePub

Motivation
You want to convert DocBook 5 document into epub format.

Method
We will create an XQuery typeswitch transformation that will perform this conversion. Note that no XSLT will be needed. If you are familiar with XQuery you will not need to learn any new transformation languages.

The basis of this transformation will be a central XQuery module that a main dispatch function that will use an typeswitch operator to implement the dispatch pattern. The main function will look at each element and then call the appropriate function. This makes the transform easy to write and easy to maintain. The main function will create a single large XML file that will then converted into a zip file with some additional book metadata. This zip file can then be tested to see if it conforms to the ePub formatting rules using a ePub validation tool.

The zip function we will used is the compression:zip function that is documented here. One might think that the way to go about this is to put all the correct documents in an eXist collection and then pass this collection to the zip function. Unfortunately there are two problems with this approach. The first is current implementation of the zip function does not allow you to specify relative paths in the collection setting and the second is that the ePub format is very strict about the order that the files appear in the ePub file. For example a text file that indicates the mime type MUST be the first file in the zip container.

For these reasons we must pass the zip function a sequence of elements that must be in a very strict order. The format is:

let $entries := (, ...) return compression:zip($entries, true)

Note: The final step of this transformation only will work on eXist 1.5. There are new features of the "zip" compression function that will not work on eXist 1.4.

Sample ePub File Generator
To demonstrate the exact format of a sample ePub file here is a "serialization" of the entire file in a single XML document:

File Entries for ePub Zip File Package:

We will not spend a large amount of time in this article explaining the exact format of the ePub file. Suffice to say that there are several "constants" such as the mime type file and the container.xml file that will not change. The other files are used to describe how the zip file should be uncompressed and what the table of contents for the file should look like. From then on each chapter in a book is essentially an XHTML file with standard elements for head, body, headers and paragraphs. The example above does not include a CSS file but this can also be included.

Storing your ePub file in a Collection
Once you have created your entry list you can now store the entries directly in a single zip file. Here is a small utility function that will store the entries to a file in a collection:

This version will check to make sure the file has a suffix of .epub and will also make sure the file is stored with the correct mime-type in the file.

Rendering the ePub to your browser
There is no need to store your ePub file in a binary file in the database. You can dynamically render any ePub file directly to your web browser on demand, just like generating any web page.

The following XQuery can then be used view the ePub file in your web browser:

Note that you must not put a return type on this function. It returns a binary and must not be cast as item or node. This is very important.

This function not only compresses the file but returns a binary stream to the browser that has the mime-type set so that if your browser has an ePub viewer it will be rendered directly in the viewer.

It turns out this is actually a very efficient way of generating documentation to the user. All of the chapters are compressed into a single compressed file and then uncompressed directly in the browser.

Screen Image
The following image is a screen image of the test ePub file being rendered in FireFox after the free EPUBReader plugin has been installed.

Example: Transforming DocBook Chapters
Although there is some work that must be done to convert the front and back portions of a book to ePub format, the heart of book creation in this example will be per-chapter processing to build a ePub "book". Note, however, you do not have to use the docbook chapter element to create the various sections of an ePub. This can be done with book parts, section, sect1 or any other docbook elements you want to use. If you do use chapters like this example ere is the main logic of the conversions to the ePub format:

for each chapter we must add:
 * 1) to the OEBPS/content.opf file we will add an XML element to the section and an to the element
 * 2) to the OEBPS/toc.ncx file add an  XML element for navigation
 * 3) to the main sequence add one for each chapter in an file for that chapter

Here is the pseudo code for these three items:

In the db2epub:package-entry function:

The following will add one item per chapter:

The following will reference this chapter if the chapter will be listed in the table of contents of the spine.

In the chapter-navmap function:

In the chapter-entries function:

Note that you must supply a function that will convert each chapter to an XHTML file. But this function frequently has been written already in a docbook-to-html module.

Getting a List of Distinct Elements in Each Chapter
We will next show how the elements within each chapter can be transformed into an XHTML ePub section. Our first step is to get a list of all of the element names used in the chapters the source DocBook document. This can be done using the following XPath expression:

let $distinct-chapter-element-names := distinct-values(/db:book//db:chapter/descendant-or-self::*/name(.))

The "descendant-or-self" XPath axis expression is very similar to using //*/name(.) but also includes the root node. You can sort this list by putting it in a FLWOR statement with the order by clause added:

let $sorted-element-names := for $element-name in $distinct-chapter-element-names order by $element-name return $element-name

This report forms the basis of your inventory of XML elements that you will use as the basis for your typeswitch transform. Note that some elements in the front and back matter of the book are not included in this list and also note that attribute transforms are handled inside the element-level functions.