XQuery/Wikipedia Lookup

Page scraping is one way to retrieve a specific fact from a page provided its structure is stable.

Here the task is to use wikipedia to find the Latin name for a bird, given its common name.

declare namespace h = "http://www.w3.org/1999/xhtml";

let $name := request:get-parameter("name",) let $url := escape-uri(concat("http://en.wikipedia.org/wiki/",$name),false) let $page := doc($url) let $genus := $page//h:tr[h:td[. ='Genus:']]/h:td[2] let $species := $page//h:tr[h:td[. ='Species:']]/h:td[2] let $binomial := string($page//h:tr[h:th//h:a[.='Binomial name']]/following-sibling::h:tr//h:b) return 

Here, the path to locate the data required, assuming the page is in Bird page format, involves complex XPath expressions. For example, the genus is the second cell in a table row whose first cell is 'Genus'.

Black Swan Wikipedia

The script often fails because:
 * 1) the name is ambiguous  ThrushWikipedia
 * 2) the name is too broad Kiwi Wikipedia

It is not hard to see that more semantic markup with ontological relationships would be preferable to these uncertain contortions.