XQuery/String Analysis

XQuery analyze-string
XSLT 2.0 includes the analyze-string construct which captures matching groups (in parentheses) in a regular expression. Strangely this is not available in XQuery. It is possible to use the XSLT construct by wrapping an XQuery function round a generated XSLT stylesheet, even though this seems rather painful. In this installation of eXist, the XSLT engine is Saxon 8. declare function str:analyze-string($string as xs:string, $regex as xs:string,$n as xs:integer ) { transform:transform (,                                                                                             ,  ) };

UK Vehicle Registration numbers
To illustrate the use of this function, here is a decoder for UK vehicle license plates. These have undergone a number of changes of format, so the script must first decide which format is used, then analyze the number to find the significant codes for the area and date of registration. The patterns are defined in XML and define the regular expression to be used, and the meaning of the matched groups.

Problem: Passing repetition modifiers through is failing

import module namespace str = "http://www.cems.uwe.ac.uk/string" at "../lib/string.xqm";

declare variable $patterns :=  Area Date  Date Area  Area Date ;

declare function local:decode-regno($regno) { let $regno := upper-case($regno) let $regno := replace($regno, " ","")

return for $pattern in $patterns/pattern let $regexp := concat("^",$pattern/@regexp,"$") return if (matches($regno,$regexp)) then let $analysis := str:analyze-string($regno,$regexp,count($pattern/field)) return  {for $field at $i in $pattern/field let $value := string($analysis[position = $i]) let $table := concat($field,$pattern/@version) let $value := /CodeList[@id=$table]/Entry[Code=$value] return element {$field} {$value/*} }      else };

let $regno := request:get-parameter("regno",) return local:decode-regno($regno)

Decode tables
Separate tables decode codes to date ranges or areas. These tables are plain XML created from CSV files via Excel. The pre-83 area codes are currently incorrect.

e.g.

Bournemouth</Location> </Entry> <Entry> <Location>Worcester</Location> </Entry> <Entry> <Location>Coventry</Location> </Entry> ...

Examples

 * 1) A current number plate: WP05LNU
 * 2) One from the previous series: L162BAY

Location Mapping
One use of this conversion is to display the locations on a map. Here we take a file of observed registration numbers, decode them all, group by location and generate a KML file with the locations geocoded through the Google API.

<NumberList> <Regno>H251GBU</Regno> <Regno>WRA870Y</Regno> <Regno>ENB427T</Regno> <Regno>C406OUY</Regno> <Regno>N62VNF</Regno> <Regno>R895KCV</Regno> <Regno>C758HOV</Regno> <Regno>H541HEM</Regno> ...

(: this script plots the registration locations of a set of  UK vehicle license plates using kml.  :)

import module namespace geo="http://www.cems.uwe.ac.uk/exist/geo" at "../lib/geo.xqm";

import module namespace str = "http://www.cems.uwe.ac.uk/string" at "../lib/string.xqm"; declare namespace reg = "http://www.cems.uwe.ac.uk/wiki/reg";

declare option exist:serialize "method=xml media-type=application/vnd.google-earth.kml+xml indent=yes  omit-xml-declaration=yes"; declare variable $reg:icon := "http://maps.google.com/mapfiles/kml/paddle/ltblu-blank.png"; declare variable $reg:patterns := <pattern version="01" regexp="([A-Z][A-Z])(\d\d)[A-Z][A-Z][A-Z]"> Area Date <pattern version="83" regexp="([A-Z])\d+[A-Z]([A-Z][A-Z])"> Date Area <pattern version="63" regexp="([A-Z][A-Z])[A-Z]?\d+([A-Z])"> Area Date ;

declare function reg:decode-regno($regno) { let $regno := upper-case($regno) let $regno := replace($regno, " ","")

return for $pattern in $reg:patterns/pattern let $regexp := concat("^",$pattern/@regexp,"$") return if (matches($regno,$regexp)) then let $analysis := str:analyze-string($regno,$regexp,count($pattern/field)) return <regno version="{$pattern/@version}"> {for $field at $i in $pattern/field let $value := string($analysis[position = $i]) let $table := concat($field,$pattern/@version) let $value := /CodeList[@id=$table]/Entry[Code=$value] return element {$field} {$value/*} }      else };

declare function reg:regno-locations($regnos) { for $regno in $regnos let $analysis := reg:decode-regno($regno) return if (exists($analysis//Location)) then string($analysis//Location) else };

let $url := request:get-parameter("url",) let $x := response:set-header('Content-Disposition','inline;filename=regnos.kml;')

return <Document> Reg nos {for $i in (1 to 10) return <Style id="size{$i}"> <IconStyle> {$i} <Icon> {$reg:icon} </Icon> </IconStyle> </Style> }     {      let $locations :=   reg:regno-locations(doc($url)//Regno) let $max := count($locations) for $place in distinct-values($locations) let $latlong := geo:geocode(concat($place,',UK')) let $count := count($locations[. = $place]) let $scale := max((round($count div $max * 10),1)) order by $count descending return <Placemark> {$place} ({$count}) <styleUrl>#size{$scale}</styleUrl> <Point> {geo:position-as-kml($latlong)} </Point> </Placemark> }  </Document>

Generate Map

SMS service
The Department of Information Science and Digital Media supports an SMS service with facilities to send and receive text messages. The service is paid for by the University of the West of England, Bristol and all traffic is logged.

A decoder for UK vehicle license numbers is one of the demonstration services which are supported for mobile-originated (MO) text messages.

The format of the text message is REG L052

e.g.447624803759

A text message in this format sent to our SMS mobile number 447624803759 passes through a PHP script which allows multiple SMS services to be supported. The script uses the first word of the message to identify the associated service endpoint, and then invokes that endpoint via HTTP, passing the prefix as code, the rest of the message as text and the origination mobile number as from.

For the prefix REG, the associated endpoint is an XQuery script: http://www.cems.uwe.ac.uk/xmlwiki/regno/smsregno.xq

The smsregno.xq script is essentially the parseregno script above.

The SMS switch then sends the Reply on to the originating mobile phone.

To do

 * solve problem with repetition modifiers (or function support for analayze-string)
 * Pre-83 area code data
 * Switch implementation in XQuery to replace the PHP application - awaits switch to eXist v2