XQuery/SPARQLing Country Calling Codes

Motivation
Stimulated by Henry Story's blog entry, the following script works on the same problem. This script uses the functions defined in previous module to execute a SPARQL query on the dbpedia server, and to convert SPARQL Query results to tuples.

First attempt
Run

In this script the resource uri is parsed to get the local name part of the resource URI in the fr:clean function.

The more sound alternative is to filter the multilingual rdfs:label property:

SELECT * WHERE { ?resource p:callingCode ?callingCode. ?resource rdfs:label ?name. FILTER (lang(?name) = 'en') }

Run

but this query is naturally much slower.

Discussion
This query returns a set of dbpedia resources which have a callingCode property. However, it includes resources which are not countries and it proves quite difficult to identify which resources are countries. It might be expected that either the skos:subject or rdfs:type predicates would identify countries, but this is not the case.

Of course, what entities are classified as countries is a debatable issue, as is currently illustrated by Kosova and by the documentation on ISO 3166. Perhaps countries are better identified by properties. There is a property countryCode which looks promising:

The SPARQL query becomes:

PREFIX :  PREFIX p:  PREFIX rdfs:  SELECT * WHERE { ?resource p:callingCode ?callingCode. ?resource p:countryCode ?countryCode. }

Run

However this shows that many countries have incomplete data in dbpedia, or that the coding of this property is inconsistent. This is not surprising because there are a number of types of country codes, which result in different definitions of country:


 * ISO 3166-1 alpha-3
 * ISO 3166-1 alpha-2
 * ISO 3166-1 numeric
 * IOC country codes
 * License plate numbers
 * Top-level domain codes

Wikipedia scraping
In fact, International Calling codes are listed in a wikipedia entry Thus a more direct approach would be to generate the table by scraping wikipedia directly. However, now we err in the opposite direction, in that there are calling codes for telecom services as well as countries, and the format of numbers and names is inconsistent - some multiple numbers, some numbers with leading +, some countries with appended synonyms etc.

In this script, the path expression finds the anchor "Alphabetical_Listing" and then finds the following table.

Jan 2010 - the page layout had changed so that the previous path to this table :

let $section := $wikipage//h:a[@name="Alphabetical_Listing"]/../following-sibling::h:table[1]

to the current : let $section := $wikipage//h:table[@class="wikitable sortable"][2] Wikipedia

Export as RDF
An alternative is to export this table as RDF. Here the resource is the dbpedia resource and the property is defined in the dbpedia property namespace.

Similarly the structure of this table changed so this code needed to be updated.

RDF