XQuery/XQuery Batch Jobs

Motivation
You want to run an XQuery job at regular intervals.

Method
We will use the eXist job scheduler. The eXist job scheduler is built around the Quartz system and eXist provides an XQuery API to this system to add and remove jobs.

Method 1: Modify the conf.xml file
If you have a job that needs to run on a regular basis, you can just add a single line to your $EXIST_HOME/conf.xml file. For example, if you have a simple XQuery script that writes a dateTime stamp to the log file, you could add the following line:

Sample Addition to conf.xml
This line says that when seconds=0 for every minute of every hour of every day-of-month for every month for each day of week, run this job.

Sample Output in log file
(Line: 7) Current date-time: 2011-07-18T15:58:00-05:00 (Line: 7) Current date-time: 2011-07-18T15:59:00-05:00

Sample Weekly Lucene Optimize
Add the following line to your $EXIST_HOME/conf.xml file.

Sample XQuery to Optimize Lucene Indexes
Contents of /db/system/jobs/optimize-lucene-indexes.xq

Method 2: Use the XQuery API
In this method we will use the XQuery API to add, view and remove jobs from the job scheduler.

To enable the XQuery scheduler you may have to set a line in the $EXIST_HOME/extensions

include.module.scheduler = true

And then type "build" to recompile the code.

And also make sure that the line in the $EXIST_HOME/conf.xml is un-commented.



Here are the two functions to add and delete jobs.

scheduler:schedule-xquery-cron-job($xquery-path, $cron-string, $job-id) scheduler:delete-scheduled-job($job-id)

Note: You must make sure that the XQuery job scheduler module is enabled in your system. You can verify this by the following XQuery:

The format of cron string is documented |here:

Listing Scheduled Jobs
You can get a list of all scheduled jobs by using the scheduler:get-scheduled-jobs XQuery function. This returns a document that has the following format:

Adding and Removing Jobs with XQuery
The following is a sample of the system calls to add and remove jobs:

Avoiding Concurrent Jobs
Sometimes you want to run a job frequently that polls a remote site, for example once every five minutes. If it finds a file it might want to transfer files. But sometimes the time to transfer the files is longer than the polling frequency. This will restart the job again.

To get around this you have two options. One is to be able to configure eXist to not run concurrent jobs. This is not available in 2.1. In that case you may need to set a flag that will test to see if the prior job has finished. You can use the cache module to set one flag for each job.

The following shows how you can use the put/get and remove functions to manage global state across queries: