XQuery/XML to RDF

For the Emp-DEPT case study, RDF must be generated from underlying XML files. An XQuery script generates the RDF. It uses a configuration file to define how columns of a table should be mapped into RDF and the namespaces to be used. This mapping needs a little more work to allow composite keys and allow user defined transformations. An interactive tool to create this map would be useful.

Issues in mapping to RDF
The main guide to publishing linked data on the web is How to Publish Linked Data on the Web. This work connected with the Wikibook entry consists in progressively applying the principles enunciated there.

This conversion illustrates a few of the differences between local datasets, whether SQL or XML, and a dataset designed to fit into a global database. Some decisions remain unclear.
 * tables are implicitly within an organisational context. This context has to be added in RDF by creating a namespace for the local properties and identifiers
 * the scope of queries is implicitly within organisational boundaries, but in RDF this scope needs to be explicit. In the SQL query select * from emp; emp is ambiguously either the class of employees or the set of employees in the company.  In RDF this needs to be explicit, so that two kind of tuples need to be added:
 * tuples to type employees to a company definition of employee
 * tuples to relate the employee to the company (to be added)
 * linkage to the global database requires two kinds of links:
 * local properties need to be mapped to global predicates. Here the employee name is mapped to foaf:surname (but the case probably needs changing). Alternatively a local predicate f:name could be defined, which is equated to the foaf predicate with owl:samePropertyAs.
 * local identifiers of resources to be replaced by global URIs. Here location is mapped to a dbpedia  resource URI. Alternatively, the local URI f:location/Dallas  could be equated to the dbPedia resource with owl:sameAs. (where? and why delay this?)
 * foreign keys are replaced by full URIs, pointing directly to the linked resource. The name of this property is no longer the name of the foreign key (e.g. MgrNo but rather the name of the related resource (Manager). However, the foreign key itself might also need to be replaced.
 * primary keys are also replaced by URIs, but the local primary key value, for example the employee number, will need to be retained as a literal if it is not purely a surrogate key. This perhaps should be mapped to rdf:label.
 * datatypes are preferably explicit in the data to avoid conversion in queries although this increases the size of the RDF graph.
 * namespaces have been expanded in full where they occur in RDF attribute values. An alternative would be to define entities in an DTD prolog as shorthand for these namespaces, but not all processors of the RDF would do the expansion. xml:base can be used to default one namespace.

[The choices made here are those of a novice and review would be welcome. ]

Some issues not yet addressed:
 * meta-data about the dataset as a whole - its origin, when and how converted, - these can be DC properties of a document, with each entity tied to that document as a part?
 * an alternative approach to mapping would be to start with an ontology and add mapping information to it rather than generating it from the ad-hoc configuration file.

Configuration file
To facilitate the conversion from XML to RDF, a separate configuration file is defined. Here is the configuration file for the emp-dept data.

Data base conversion functions
One function row-to-rdf generates the RDF for a row of a table, another function map-to-schema generates RDFS descriptions of the predicates used in a table.

Full database conversion
The script to generate the RDF for the full database:

Links

 * Get RDF
 * Cached RDF
 * Validate RDF

Resource RDF
In addition each resource is retrieved as RDF. In this simple example, the request for a resource URI like:

http://www.cems.uwe.ac.uk/empdept/emp/7839

is re-written by Apache to http://www.cems.uwe.ac.uk/xmlwiki/RDF/empdeptrdf.xq?emp=7839

and the script retrieves the RDF:Description of the selected resource from the RDF file directly.

This mechanism does not conform to the recommended practice of distinguishing between information resources (such as the information about employee 7839) and the real world entity being represented. At present, the resource URI de-references directly to the RDF, rather than to indirect using the 303 mechanism recommended.

To Do

 * compound primary keys
 * conversion functions, for example to convert the case of strings, reformat dates
 * added resources and relationships - here the a company entity and links from departments to company