XQuery/UK shipping forecast

Motivation
The UK shipping forecast is prepared by the UK met office 4 times a day and published on the radio, the Met Office web site and [no longer] the BBC web site. However it is not available in a computer readable form.

Tim Duckett recently blogged about creating a Twitter stream. He uses Ruby to parse the text forecast. The textual form of the forecast is included on both the Met Office and BBC sites. However as Tim points out, the format is designed for speech, compresses similar areas to reduce the time slot and is hard to parse. The approach taken here is to scrape a JavaScript file containing the raw area forecast data.

eXist-db Modules
The following scripts use these eXist modules:
 * request - to get HTTP request parameters
 * httpclient - to GET and POST
 * scheduler - to schedule scrapping tasks
 * dateTime - to format dateTimes
 * util - base64 conversions
 * xmldb - for database access

Other

 * UK Met office web site

Met Office page
[This approach is no longer viable since the javascript file retrieved is no longer being updated]

The Met office page shows an area-by-area forecast but this part of the page is generated by JavaScript from data in a generated JavaScript file. In this file, the data is assigned to multiple arrays. A typical section looks like

JavaScript conversion
The first function fetches the current JavaScript data using the eXist httpclient module and  converts the base64 data to a string:

The second function picks out an area forecast from the JavaScript and parses the code to generate an XML structure using the JavaScript array names.

To fetch an area forcast :

For example, the output for one selected area is:

Format the forecast as text
The forecast data needs to be formatted into a string:

which returns Rain at times. Wind Southwest veering northeast, 5 or 6. Visibility Good, occasionally poor. Sea Moderate or rough.

Area Forecast
Finally these functions can be used in a script which accepts a shipping area name and returns an XML message:

Message abbreviation
To create a message suitable for texting (160 characters), or tweeting (140 character limit), the message can compressed by abbreviating common words.

Abbreviation dictionary
A dictionary of words and abbreviations is created and stored locally. The dictionary has been developed using some of the abbreviations in Tim Duckett's Ruby implementation.

The full dictionary

Abbreviation function
The abbreviation function breaks down the text into words, replaces words with abbreviations and builds the text up again:

Abbreviated Message

 * Lundy
 * Fastnet

All Areas forecast
This function is an extension of the area forecast. The parse uses the comment separator to break up the script, ignores the first and last sections and the area name in the comment

XML version of forecast
This script returns the full Shipping forecast in XML:


 * Execute

RSS version of forecast
XSLT would be suitable for transforming this XML to RSS format ...

SMS service
One possible use of this data would be to provide an SMS on-request service, taking an area name and returning the abbreviated forecast. The complete set of forecasts are created, and the one for the area supplied as the message selected and returned as an abbreviated message.

The calling protocol is determined here by the SMS service installed at UWE and described here


 * Execute

Caching
Fetching the JavaScript on demand is neither efficient nor acceptable net behaviour, and since the forecast times are known, it is preferable to fetch the data on a schedule, convert to the XML form and save in the eXist database and then use the cached XML for later requests.

Store XML forecast
The timestamp used on the source data is converting to an xs:dateTime for ease of later processing.

Reducing the forecast data
The raw data contains redundant elements (several versions of the area name) and elements which are normally empty (all gale related elements when no gale warning) but lacks a case-normalised area name as a key. The following function performs this restructuring:

There would be a case to make for using XSLT for this transformation. The caching script applies this transformation to the forecast before saving.

SMS via cache
The revised SMS script can now access the cache. First a function to get the stored forecast:

In this script, the selected forecast for the input area extracted by the met function call is a reference to the database element, not a copy. Thus it is still possible to navigate back to the parent element containing the timestamp.

The eXist datetime functions are wrappers for the Java class java.text.SimpleDateFormat which defines the date formatting syntax.


 * Lundy

Job scheduling
eXist includes a scheduler module which is a wrapper for the Quartz scheduler. Jobs can only be created by a DBA user.

For example, to set a job to fetch the shipping forecast on the hour,

where "0 0 * * * ?" means to run at 0 seconds, 0 minutes past every hour of every day of every month, ignoring the day of the week.

To check on the set of scheduled jobs, including system schedule jobs:

It would be better to schedule jobs on the basis of the update schedule for the forecast. These times are 0015, 0505, 1130 and 1725. These times cannot be fitted into a single cron pattern so multiple jobs are required. Because jobs are identified by their path, the same url cannot be used for all instances, so a dummy parameter is added.

Discussion The times are one minute later than the published times. This may not be enough slack to account for discrepancies in timing on both sides. Clearly a push from the UK Met Office would be better than the pull scraping. The scheduler clock runs in local time (BST) as are the publication times.

Sea area coordinates
The UK Met Office provides a clickable map of forecasts but a KML map would be nice. The coordinates of the sea areas can be captured and manually converted to XML.

The boundary for an area is accessed by two functions. In this idiom one function hides the document location and returns the root of the document. Subsequence functions use this base function to get the document and then apply further predicates to filter as required.

The centre of an area can be roughly computed by averaging the latitudes and longitudes:

kml Placemark
We can generate a kml PlaceMark from a forecast:

kml area area
Since we have the area coordinates, we can also generate the boundaries as a line in kml.

Generate the kml file

 * raw kml
 * on GoogleMap

Push messages
An alternative use of this data is to provide a channel to push the forecasts through as soon as they are received. The channel could be a SMS alert to subscribers or a dedicated Twitter stream which users could follow.

Subscription SMS
This service should allow a user to request an alert for a specific area or areas. The application requires:


 * a data structure to record subscribers and their areas
 * a web service to register a user, their mobile phone number and initial area [to do]
 * an SMS service to change the required area and turn messaging on or off
 * a scheduled task to push the SMS messages when the new forecast has been obtained

XML Schema
(to be completed)

Access control
Access to this document needs to be controlled.

The first level of access control is to place the file in a collection which is not accessible via the web. In the UWE server, the root (via mod-rewrite) is the collection /db/Wiki so resources in this directory and subdirectories are accessible, subject to the access settings on the file, but files in parent or sibling directories are not. So this document is stored in the directory /db/Wiki2. The URL of this file, relative to the external root is http://www.cems.uwe.ac.uk/xmlwiki/../Wiki2/shippingsubscriptions.xml but access fails.

The second level of control is to set the owner and permissions on the file. This is needed because a user on a client behind the firewall, using the internal server address, will gain access to this file. By default, world permissions are set to read and update. Removing this access requires the script to login to read as group or owner. Ownership and permissions can be set either via the web client or by functions in the eXist xmldb module.

SMS push
This function takes a subscription, formulates a text message and calls a general sms:send function to send. This interfaces with our SMS service provider.

SMS push subscriptions
First we need to get the active subscriptions. The functions follow the same idiom used for boundaries:

and then to iterate through the active subscriptions and report the result:

This script iterates through the subscriptions currently active and calls the push-SMS function for each one.

This task could be scheduled to run after the caching task has run or the caching script modified to invoke the subscription task when it has completed. However eXist also supports triggers so the task could also be triggered by the database event raised when the forecast file store has been completed.

Subscription editing by SMS
A message format is required to edit the status of the subscription and to change the subscription area:

metsub [ on |off | ]

If the area is changed the status is set to on.

The area is validated against a list of area codes. These are extracted from the boundary data:

Twitter
Twitter has a simple REST API to update the status. We can use this to tweet the forecasts to a Twitter account. Twitter uses Basic Access Authentication and a suitable XQuery function to send a message to a username/password, using the eXist httpclient module is :

A script is needed to access the stored forecast and tweet the forecast for an area. Different twitter accounts could be set up for each shipping area. The script will need to be scheduled to run after the full forecast has been acquired.

In this example, the forecast for given are is tweeted to a hard-coded twitterer:

Chris Wallace's Twitter

Creating and editing subscriptions
This task is ideal for XForms.

Triggers
Use a trigger to push the SMS messages when update has been done.