XQuery/World Temperature records

Introduction
The Met office recently released the temperature records for about 1600 stations world-wide. Each station record is available online as a text file, for example Stornoway.

This case study describes a project to make this data available as XML. The home page is http://www.cems.uwe.ac.uk/xmlwiki/Climate/index.html

Parsing temperature record to XML
The first task is to convert the plain text to XML. The main page explains the format of this text file. The code 030260 is the station code defined by the World Meteorological Organisation. It appears that the files are stored in country code directories. (actually these are Blocks in WMO parlance)

Remote data files
The task of using HTTP to GET a remote data file is a common task for which functions already exist in an XQuery module.

This module declares a constant used in the parsing: declare variable $csv:newline:= "&amp;#10;";

And the basic function to get text, which may be plain text or base64encoded:

declare function csv:get-data ($uri as xs:string, $binary as xs:boolean) as xs:string? { (:~  :  Get a file via HTTP and convert the body of the HTTP response to text   :  force the script to get the latest version using the HTTP Pragma header   : @param uri  - URI of the text file to read   : @param binary - true if data is base64 encoded   : @return  -  the body of the response as text or null

let $headers := element headers { element header {attribute name {"Pragma" }, attribute value {"no-cache"}}} let $response := httpclient:get(xs:anyURI($uri), true, $headers) return if ($response/@statusCode eq "200") then let $raw := $response/httpclient:body return if ($binary) then util:binary-to-string($raw) else xmldb:decode($raw) else };

Parsing Function
We will create an XQuery module containing functions to carry out the parsing:

module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met";

The csv module needs to be imported: import module namespace csv = "http://www.cems.uwe.ac.uk/xmlwiki/csv" at "../lib/csv.xqm";

Now the function to parse the MET climate data: declare function met:station-to-xml ($station as xs:string) as element(TemperatureRecord)? { (:~  : GET and parse a MET office temperature record as documented in    :  http://www.metoffice.gov.uk/climatechange/science/monitoring/subsets.html   : @param the station number   : @return  the temperature record as an adhoc XML structure matched closely to the terms used in the original record

let $country := substring($station,1,2) (: this is the directory for all temperature records in a country :) (: construct the URI for the corresponding record :) let $uri := concat("http://www.metoffice.gov.uk/climatechange/science/monitoring/reference/",$country,"/",$station) (:GET and convert to plain text :) let $data := csv:get-data($uri,false)

return if (empty($data)) then else (: split into two sections :) let $headertext := substring-before($data,"Obs:") (: the first section contains the meta data in the form of name=value statements :) let $headers := tokenize($headertext,$csv:nl)

(: the second section is the temperature record, year by year :) let $temperatures := substring-after ($data,"Obs:") let $years := tokenize($temperatures, $csv:nl)

return element TemperatureRecord { element sourceURI {$uri},   (: the original temperature record :) for $header in $headers   (: split each line into a name and its value :) let $name := replace(substring-before($header,"=")," ","")    (: to create a valid XML name, just remove any spaces :) let $value := normalize-space(substring-after ($header,"=")) where $name ne "" return element {$name} {   (:create an XML element with the name :) if ($name = ("Normals","Standarddeviations"))  (: these names have values which are a list of temperatures :) then for $temp in tokenize($value,"\s+") (: temperatures are space-separated :) return element temp_C {$temp} else if ($name = ("Name","Country"))  (: these names contain redundant hyphens :) then replace ($value,"-","") else if ($name = "Long")  (: the convention for signing longitudes in this data is the reverse of the usual E +, W - convention :) then - xs:decimal($value) else $value },    for $year in $years let $value := tokenize($year,"\s+") where $year ne "" return element monthlyAverages { attribute year {$value[1]},   (: the first value in the row is the year :) for $i in (2 to 13)                (: the remainder are the temperatures for the months Jan to Dec :) let $temp := $value[$i] return element temp_C { if ($temp ne '-99.0')       (: generate all months, but those with no reading indicated by -99  will be empty :) then $temp else }        }  } };

Main Script
The main script uses these functions to convert a given station's record:

(:~   : convert climate  file to XML    : @param  station  id of station

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";

let $station := request:get-parameter("station",) return local:station-to-xml($station,false)

Stornoway

WMO stations
The station ids are based on those defined by the World Meteorological Organisation. There is a full list of all stations available online as a text file with supporting documentation.

A typical record is

00;000;PABL;Buckland, Buckland Airport;AK;United States;4;65-58-56N;161-09-07W;;;7;;

The format of these record is


 * 1) Block Number 	2 digits representing the WMO-assigned block.
 * 2) Station Number 	3 digits representing the WMO-assigned station.
 * 3) ICAO Location Indicator 	4 alphanumeric characters, not all stations in this file have an assigned location indicator. The value "" is used for stations that do not have an assigned location indicator.
 * 4) Place Name 	Common name of station location.
 * 5) State 	2 character abbreviation (included for stations located in the United States only).
 * 6) Country Name 	Country name is ISO short English form.
 * 7) WMO Region 	digits 1 through 6 representing the corresponding WMO region, 7 stands for the WMO Antarctic region.
 * 8) Station Latitude 	DD-MM-SSH where DD is degrees, MM is minutes, SS is seconds and H is N for northern hemisphere or S for southern hemisphere. The seconds value is omitted for those stations where the seconds value is unknown.
 * 9) Station Longitude 	DDD-MM-SSH where DDD is degrees, MM is minutes, SS is seconds and H is E for eastern hemisphere or W for western hemisphere. The seconds value is omitted for those stations where the seconds value is unknown.
 * 10) Upper Air Latitude 	DD-MM-SSH where DD is degrees, MM is minutes, SS is seconds and H is N for northern hemisphere or S for southern hemisphere. The seconds value is omitted for those stations where the seconds value is unknown.
 * 11) Upper Air Longitude 	DDD-MM-SSH where DDD is degrees, MM is minutes, SS is seconds and H is E for eastern hemisphere or W for western hemisphere. The seconds value is omitted for those stations where the seconds value is unknown.
 * 12) Station Elevation (Ha) 	The station elevation in meters. Value is omitted if unknown.
 * 13) Upper Air Elevation (Hp) 	The upper air elevation in meters. Value is omitted if unknown.
 * 14) RBSN indicator 	P if station is defined by the WMO as belonging to the Regional Basic Synoptic Network, omitted otherwise.

Conversion to XML
A function is needed to convert from the DD-MM-SSH format of latitudes and longitudes. This is complicated by the variations in this format. These variations all appear in the data:


 * DD-MMH
 * DD-MH
 * DD-MM-SH
 * DD-MM-SSH

Because this format occurs in other data, it has been added to a general module of geographic functions.

declare function geo:lz ($n as xs:string?) as xs:integer { xs:integer(concat (string-pad("0",2 - string-length($n)),$n)) };

declare function geo:dms-to-decimal($s as xs:string) as xs:decimal { (:~   : @param $s  - input string in the format of      DD-MMH, DD-MH, DD-MM-SH,* DD-MM-SSH     :  where H is NSE or W    : @return decimal degrees

let $hemi := substring($s,string-length($s),1) let $rest := substring($s,1, string-length($s)-1) let $f := tokenize($rest,"-") let $deg := geo:lz($f[1]) let $min:= geo:lz($f[2]) let $sec := geo:lz($f[3]) let $dec :=$deg + ($min + $sec div 60) div 60 let $dec := round-half-to-even($dec,6) return if ($hemi = ("S","W")) then - $dec else $dec };

The geo module has to be imported: import module namespace geo = "http://www.cems.uwe.ac.uk/xmlwiki/geo" at "../lib/geo.xqm";

Parsing the station data.

declare function met:WMO-to-xml ($station as xs:string ) as element (station) { (:~  : @param  $station  string describing a station    :  Upper Air data is ignored at present.

let $f := tokenize(normalize-space($station),";") let $cid := concat($f[1],$f[2],"0") (: this constructs the equivalent id used in the temperature records :) return element station{ element block {$f[1]}, element number {$f[2]}, element id {$cid}, if ($f[3] eq "")  then    else element ICAO {$f[3]}, element placeName {$f[4]}, if ($f[5] ne "")  then element state {$f[5]}  else , element country {$f[6]}, element WMORegion {$f[7]}, element latitude {geo:dms-to-dec($f[8])}, element longitude {geo:dms-to-dec($f[9])}, if ($f[12] ne "")  then element elevation {$f[12]}   else , if ($f[14] = "P")  then element RBSN {} else } };

Generating the WMO XML file
The XQuery script GETs the text file and converts each line to an XML station element. The elements are then inserted into an empty XML file one by one.

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm"; import module namespace csv = "http://www.cems.uwe.ac.uk/xmlwiki/csv" at "../csv.xqm";

{

(: create the empty XML document :) let $login := xmldb:login("/db/Wiki/Climate","user","password") let $store := xmldb:store("/db/Wiki/Climate/Stations","metstations.xml", ) let $doc := doc($store)/stations

(: get the text list of stations and convert :) let $station-list := "http://weather.noaa.gov/data/nsd_bbsss.txt" let $csv := csv:get-data($station-list,false)

for $data in tokenize($csv,$nl) where $station ne "" return let $station := met:WMO-station-to-xml($data) let $update := update insert $station into $doc return {$xml/id} }

Indexing
There are 11000 odd stations in total. These need to be indexed for efficient access. In eXist indexes are defined in a configuration file, one per collection (directory). For the collection in which the station XML document is to be written, the configuration file is:

  

This means that all XML documents in the collection will be indexed on the qnames id and country wherever these appear in the XML structure. Indexing will be performed when a document is added to the collection or an existing document is updated. A re-index can be forced if required.

If the station data is stored in the collection /db/Wiki/Climate/Stations, this configuration file will be stored in /db/system/config/db/Wiki/Climate/Stations as configuration.xconf

WMO Station set binding
Since the code will reference this collection in a number of places, we add a constant to reference the set of stations to the library module:

declare variable $met:WMOStations := doc ("/db/Wiki/Climate/Stations/metstations.xml")//station;

Temperature Station list
A full listing of stations is needed to provide an index. This data is not provided as a simple file, but they are encoded on the HTML page as a JavaScript array.

locations[1]=["409380|Afghanistan, Islamic State Of / Afghanistan, Etat Islamique D'|Herat"",409480|Afghanistan, Islamic State Of / Afghanistan, Etat Islamique D'|Kabul Airport","409900|Afghanistan, Islamic State Of / Afghanistan, Etat Islamique D'|Kandahar Airport"]; ...

However there is no location data here, so we will get that from the WMO station list:

The approach taken to converting this to XML was:


 * 1) View source on the HTML page
 * 2) Locate the station list
 * 3) Copy the text
 * 4) Save as a text file in the eXIst data base
 * 5) A script reads this file and parses it to XML
 * 6) The resultant XML is augmented with latitude and longitude from the WMO station data.
 * 7) The final XML document is stored in the database in the same Station directory

(:~  :  convert  the text representation of MET stations  from the WMO list to XML

 {

(: get the raw data from a text file stored as base64 in the eXist dataabse :) let $text := util:binary-to-string(util:binary-doc("/db/Wiki/Climate/cstations.txt"))

(: ;  separates the stations in each country :) for $country in tokenize($text,";")

(: the station list is the array element content i.e. the string  between  =[  and ]  :) let $stationlist := substring-before(substring-after($country,"=["),"]")

(: The stations in each country are comma-separated, but commas are also used within the names of countries and stations. However a comma followed by a double quote is the required separator. :) let $stations := tokenize($stationlist,',"') for $station in $stations (:  some cleanup of names is needed :) let $data :=replace ( replace($station,'"',""),"&#xA;","")

(:  Each station is in the format of        Stationid | English name / French name

let $f := tokenize($data,"\|") let $id := $f[1] let $country := tokenize($f[2],"/") let $WMOStation := $met:WMOStations[id=$id]

(: create a station element containing the id, country and english station name :) return element station { element id  {$f[1]}, element country {normalize-space($country[1])}, element location {$f[3]}, $WMOStation/latitude, $WMOStation/longitude } } 

Storing this file in the same Stations collection means that it will be indexed on the same element names, id and country,as the full WMO station data.

Temperature station list

Climate station set binding
This set of stations will also be referenced in several places so we define a variable:

declare variable $met:tempStations := doc ("/db/Wiki/Climate/Stations/tempstations.xml")//station;

Visualizing the data
We will use XSLT to transform this XML to a presentation of the location of the station and charts of the temperatures. The initial stylesheet was developed by Dave Challender.

( explanation to be added )

         Station Number:&#160;   <xsl:template match="station"> <xsl:value-of select="placeName"/> <xsl:text>, </xsl:text> <xsl:value-of select="country"/> <xsl:text> </xsl:text> <a href="http://maps.google.com/maps?q={latitude},{longitude}"> <img src="http://maps.google.com/maps/api/staticmap?zoom=11&amp;maptype=hybrid&amp;size=400x300&amp;sensor=false&amp;key=ABQIAAAAVehr0_0wqgw_UOdLv0TYtxSGVrvsBPWDlNZ2fWdNTHNT32FpbBR1ygnaHxJdv-8mkOaL2BJb4V_yOQ&amp;markers=color:blue|{latitude},{longitude}" alt="{placeName}"/> </a> </xsl:template> <xsl:template match="@* | node"> <xsl:copy> <xsl:apply-templates select="@* | node"/> </xsl:copy> </xsl:template> </xsl:stylesheet>

Multiple formats
We would like to present either the original XML or the HTML visualisation page. We could use two scripts, or combine them into one script with a parameter to indicate how the output is to be rendered. eXist functions allow the serialization of the output and the mime-type to be set dynamically.

import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";

let $id := request:get-parameter("station",) let $render := request:get-parameter("render",) let $station := doc ("/db/Wiki/Climate/Stations/metstations.xml")//station[id =  $id] let $tempStation := doc("/db/Wiki/Climate/Stations/tempstations.xml")//station[id = $id] let $temp := if ($tempStaion) then met:station-to-xml($id) else let $station := <Station> {$station} {$temp} </Station>

return if ($render="HTML") then let $ss := doc("/db/Wiki/Climate/FullHTMLMet-V2.xsl") let $options := util:declare-option("exist:serialize","method=xhtml media-type=text/html") let $start-year := request:get-parameter("start","1000") let $end-year := request:get-parameter("end","2100") let $params := <param name="start-year" value="{$start-year}"/> <param name="end-year" value="{$end-year}"/> return transform:transform($station,$ss,$params) else let $header := response:set-header("Access-Control-Allow-Origin","*") return $station

Stornoway HTML Stornoway XML

Simple HTML index
We can use the stored station list to create a simple HTML index. import module namespace met = "http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";

declare option exist:serialize "method=xhtml media-type=text/html";

Index of Temperature Record Stations Index of Temperature Record Stations {  for $country  in distinct-values($met:tempStations/country) order by $country return {$country} {for $station in $met:tempStations[country=$country] let $id := $station/id order by $station/location return <a href="station.xq?station={$id}&amp;render=HTML">{string($station/location)}</a> }  }

Temperature Station list

Station Map
We can also generate a (large) KML overlay, with links to each station's page.

We need a function transform a station into a PlaceMark with a link to the HTML station page: declare function met:station-to-placemark ($station) { let $description := <a href="http://www.cems.uwe.ac.uk/xmlwiki/Climate/station.xq?station={$station/id}&amp;render=HTML">Temperature Record</a> return <Placemark> {string($station/location)}, {string($station/country)} {util:serialize($description,"method=xhtml")} <Point> {string($station/longitude)},{string($station/latitude)},0 </Point> </Placemark> };

Then the main script iterates over all the temperature stations to generate the full KML file. import module namespace met ="http://www.cems.uwe.ac.uk/xmlwiki/met" at "met.xqm";

declare option exist:serialize "method=xml media-type=application/vnd.google-earth.kml+xml indent=yes  omit-xml-declaration=yes"; let $x := response:set-header('Content-Disposition','attachment;filename=country.kml')

return <kml xmlns="http://www.opengis.net/kml/2.2"> <Folder> Stations { for $station in $met:tempStations return met:station-to-placemark($station) }  </Folder>

Full KML

KML rendered via GoogleMaps

Work in progress

 * Resource URIs
 * RDF