XQuery/Generating PDF from XSL-FO files

Motivation
You want to generate documents with precise page layout from XML documents, for example to PDF.

Approach
Typically, the steps required to generate a PDF document are:

http://www.renderx.com/ RenderX] or Antennahouse
 * retrieve or compute the base XML document
 * transform XML file to XSL-FO markup, perhaps using XQuery typeswitch or XSL
 * transform the XSL-FO to PDF using the free Apache FOP or a commercial FOP rendering engine such as

Method
We will use a built-in eXist function to convert XSL-FO file into PDF. (See ../Installing the XSL-FO module/ if this module is not installed and configured.)

Using the xslfo:render function
The function is the xslfo:render. It has the following structure:

let $pdf-binary := xslfo:render($input-xml-fo-document, 'application/pdf', $parameters)

or if you use a XSL-FO configuration file:

let $pdf-binary := xslfo:render($input-xml-fo-document, 'application/pdf', $parameters, $fo-config-file)

This file can be saved directly to the XML file system. It will be stored as a non-searchable binary document.

You can then view this directly by providing a link to the file or you can send it directly to the browser by using the response:stream-binary function as follows:

return response:stream-binary($pdf-binary, 'application/pdf', 'output.pdf')

Example XQuery to Generate PDF
The following program will generate a PDF document with the text "Hello World".

Execute

Enabling the XSL-FO Module
You will need a module that converts XSL-FO to PDF. Examples of these are: Make sure that the module extension is loaded. You can do this by going to the $EXIST_HOME/conf.xml file and un-commenting the following line (around line 769):
 * 1) The Apache FOP processor (free open source)
 * 2) The Antenna House FOP processor (commercial) http://www.antennahouse.com/
 * 3) The RenderX FTP processor (commercial) http://www.renderx.com/

Where the possible values for the processorAdapter parameter are:

org.exist.xquery.modules.xslfo.ApacheFopProcessorAdapter for Apache's FOP

If the module is correctly loaded then you should see it in the function documentation.

Make sure that you have correctly edited the $EXIST_HOME/extensions/build.properties to set XSLFO to to be true:

Change:

# XSL FO transformations (Uses Apache FOP) include.module.xslfo = false

To be: include.module.xslfo = true

After you change BOTH these files you will need to run the "build.sh" or "build.bat" program in your $EXIST_HOME to get the new FOP binaries in the jar files.

Make sure that the build file can get access to the correct fop.jar file from the Apache web site.

Automatically Downloading The Apache XSL-FO Jar Files
Exist comes with a sample ant task that can automatically download the FOP distribution zip file, extract the tree jar files we need and remove the rest. Here is the ant target from the eXist 1.4 $EXIST_HOME/modules/build.xml

Note that fop 1.0 is now available so you can change this task to be the following:

Sample Transcript
The following is a sample transcript:

At the end of this process you should see the following three jar files in your $EXIST_HOME/lib/extensions folder:

cd $EXIST_HOME/lib/extensions $ ls -l -rwxrwxrwx+ 1 Dan McCreary None 3318083 2010-12-10 09:23 batik-all-1.7.jar -rwxrwxrwx+ 1 Dan McCreary None 3079811 2010-12-10 09:23 fop.jar -rwxrwxrwx+ 1 Dan McCreary None 569113 2010-12-10 09:23 xmlgraphics-commons-1.4.jar

If you do not see these files you can manually copy them from the a download of the XSL-FO binaries.

Now go to the $EXIST_HOME directory and type "build". You should not see any error messages. If you do got to the build file and fix or remove the errors.

After you reboot you should be able to see the XSL-FO convert the file into a PDF file.

Notes on installing RenderX XSL-FO Processors
RenderX is a commercial FOP processor that is used in place of the Apache FOP processor.

Edit Config files
On exist 1.4 you must enable include.module.xslfo = true in extensions/build.properties and run "build.sh" or "build.bat" This step is not necessary if you run the 2.0 release.

Edit conf.xml and comment out the reference to the default Apache xslfo module. Change the module to use RenderX as follows:

Copy RenderX jar files
Copy all .jar from XEP/lib into $EXIST_HOME/lib/user

Restart eXist-db
Restart the eXist database.

Test
change your XQuery to include the xep configuration as an XML element and pass it to the render function:

In $config you need to make sure the path to the license and fonts points to the correct location on your disk.

Using Config File for External References
When you reference an image you must either use an absolute reference and make sure that the server has read access or you must use a relative path reference. The root of relative path references can be set in the xslfo config file.

You many not want to hardcode your hostname and port and context. To make this work on any host, port and context you can use the following code to build your FOP base:

Now you can use the following FOP template to generate your PDF.

Including SVG Images in your PDF files
When you create PDF documents you have the ability to include "line art" directly in the PDF files that have use the SVG format.

There are some translation issues from SVG to PDF but much of the line-art converts very well.

To get SVG rendering to work within eXist you must also load the Sun AWT libs if you reference SVG images.

http://xmlgraphics.apache.org/fop/0.95/graphics.html#batik

Which says you must tell Java to force-load the awt libraries when the JVM starts up:

-Djava.awt.headless=true

In your $EXIST_HOME/startup.bat or $EXIST_HOME/startup.sh you will need to add the following:

set JAVA_OPTS="-Xms128m -Xmx512m -Dfile.encoding=UTF-8 -Djava.endorsed.dirs=%JAVA_ENDORSED_DIRS% -Djava.awt.headless=true"

If you are using the "wrapper" tool to start your sever you will need to add the following lines to the $EXIST_HOME/tools/wrapper/conf/wrapper.conf

# make AWT load the fonts for SVG rendering inside of XSLFO wrapper.java.additional.6=-Djava.awt.headless=true

Using Inline SVG
One of easy ways to test your configuration is to use an inline reference to an SVG file. You can do this by using the fo:instream-foreign-object element. The following is an example of this.

Sample External SVG Reference
Note this assumes you have configured your URL in the FOP configuration file.

Formatting LaTeX Equations in XSL-FO
Latex is a non-xml language for used for typesetting documents that have mathematical equations. Despite its unusual non-markup syntax, LaTeX is still popular in many mathematics, and physics publications. XSL-FO includes an extension package that allows LaTeX equations to be added to XSL-FO documents. To use the package you must add two jar files to the $EXIST_HOME/lib/extensions, reboot eXist and then add the appropriate syntax to your XSL-FO document.

Installation Steps
From this site: http://forge.scilab.org/index.php/p/jlatexmath/downloads/

Copy the following files into your $EXIST_HOME/lib/extensions


 * jlatexmath-1.0.3.jar
 * jlatexmath-fop-1.0.3.jar

Then restart your eXist server so the jar files are loaded.

Then add the following code to your FO

Note that the XSL-FO software does not automatically pull fonts out of the config file. To force fonts to load into RAM you will need to add the following auto-detect element to your fop configuration file.



Math ML Equation Support
Note: this item is not complete yet.

Although Latex is a common way to represent equations, the Math Markup Language also will work.

There are also hints that http://jeuclid.sourceforge.net/ works

This has not yet been tested.

Instructions for RenderX
Updated steps from Wolfgang on Jan 6th 2014:


 * copy license.xml to EXIST_HOME
 * copy x4u.jar, xep.jar and xt.jar from xep into EXIST_HOME/lib/user
 * edit conf.xml to change the XSL FO driver:


 * restart eXist for the jars to be loaded
 * upload xep.xml from the xep directory into a collection in eXist (e.g. /db)
 * edit xep.xml and change the base directory for xep fonts. it should point to the xep install directory:
 * If you are on a Mac, you may have to change change the directory for the Arial font further down in the file:
 * call xep in your XQuery as follows:

The util:expand trick is required because xslfo:render expects an in-memory DOM element for $config (this should probably be fixed).

Note: xep prints error messages to stdout, so you usually don’t see them. I was running eXist via the launcher, so I opened the „Tool Window“ via the system tray menu and clicked on „Show console messages“.

Updated steps from Kevin Brown (RenderX) on May 17th 2017:

Instead of pulling apart the installation of RenderX, you can edit the master configuration file of RenderX to resolve all other files you may need, including the license file. So, an installation for RenderX is easy if you follow these steps:


 * Install RenderX in any directory you wish or if RenderX is already installed, make note of the directory. Example: an installation on Windows may be in "C:\Program Files\RenderX\XEP"
 * Copy "xep.jar" from the installation of RenderX to the installation of exist-db's "/lib/user" directory. Note from the above installation notes, you do not need "x4u.jar" only "xep.jar" is required. If you want validation of the XSL FO to be reported, then you also need "xt.jar". The files "xep.jar" and "xt.jar"would be located in the "/lib" directorty of the RenderX installation. In the above example this would be "C:\Program Files\RenderX\XEP\lib"
 * Insert "xep.xml" into your database. "xep.xml" is the RenderX configuration file located in the root of the RenderX installation.
 * Edit "xep.xml" in the database and change the "config" element to add an "xml:base" attribute that points to the installation of RenderX on disk. This one step will then allow all the other files (like "license.xml", "rolemap.xml" and other things like hyphenation and fonts to be found as they are all relative to the "xml:base" of the configuration. Given the above example install, I would have the following root element in the "xep.xml" that is inside the database:

&lt;config xmlns="http://www.renderx.com/XEP/config" xml:base="file:/C:/Program Files/RenderX/XEP/"&gt;


 * Optionally, if you did not copy "xt.jar" or do not want validation, add the following to "xep.xml":

&lt;option name="VALIDATE" value="false"/&gt;


 * Edit "conf.xml" as covered above, restart the database and format away.

An installation such as this essentially means that an external installation of RenderX and the installation with exist-db start as essentially the same, sharing the same fonts, hyphenations, license and other files. Of course, if you update "xep.xml" on disk, you need to update in the database also. Or if you update RenderX, you need to copy over the updated "xep.jar" into the exist-db installation.

Acknowledgments
The user Dmitriy has been helpful in the creation of the procedure for installation on systems that do not have source code. Wolfgang has also added feedback on the RenderX instructions for eXist 2.1 Josef Karthauser helped with getting LaTeX equations to render correctly within PDF documents.

Discussion
The steps to enable the FOP module should be listed somewhere in the eXist administrative site and removed from this Wikibook.

The RenderX instructions can be greatly simplified. The root element of the RenderX configuration file ("xep.xml") takes "xml:base" as an argument. If you have RenderX installed in some directory -- let's say for example in "C:\Program Files\RenderX\XEP" on your system, then after importing "xep.xml", you can just edit the root element in the imported version like this:



All other references in the "xep.xml" would be relative to this and as such, does not require you to move anything at all (like putting license.xml in EXIST_HOME or image directories or fonts or anything.

Also note that "X4U.jar" should not be required as it only is for the GUI and "xt.jar" is not required unless you set validate to true in "xep.xml". Only "xep.jar" is required.