XML Schema

Welcome to the XML Schema book. It describes the structure of an XML Schema and explains how XML Schemas are used to validate XML documents.

Prerequisites
Students reading this book should already be familiar with the fundamental principles of XML and have some background on Data Types.

Course Guidelines
Only the The prefix, The root element:  and Elements sections are required to build XML Schemas. History of the XML Schema, What are XML Schemas used For?, When XML Schema become inefficient at validating complex rules and XML Schema Example can be skipped and the sections following the Elements section are additional information starting from the most important ones.

History of the XML Schema
XML Schema is a standard created by the world wide web consortium http://www.w3c.org. Unlike DTDs, XML Schema uses XML file formats to define the XML Schema itself. Once you learn the structure of an XML file, you don't have to learn another syntax.

What are XML Schemas used For?
XML Schemas are primarily used to validate the structure and data types of an XML document. By structure we mean that an XML Schema defines:


 * 1) what data elements are expected
 * 2) what order the data elements are expected in
 * 3) what nesting these data elements have
 * 4) what data elements are optional and what data elements are required

XML Schemas can also be used by XML data mapping tools for quickly extracting data from databases and transferring them in XML files.

One of the best analogies is the blueprint analogy. Just like there are architectural blueprints that describe the structural design of a house, an XML Schema provides the "structural design" of a file.

When XML Schema become inefficient at validating complex rules
Although XML Schemas are excellent at sequential validation of data elements and data types, XML Schema tend to be cumbersome at expressing highly complex business rules. For example when you are at the end of a large file it is difficult to state a rule that checks if a data element has some value that another data element at the beginning of the file should have had another values. This can be done by using XML transforms and XPath expressions.

XML Schema Example
Here is a full example of a complete XML Schema file of personal contacts.

Structure of an XML Schema Document
Like any other XML file, XML Schema files normally begin with an XML declaration, which is followed by a root element (always ).

The prefix
Although any prefix can be used to refer to the namespace http://www.w3.org/2001/XMLSchema, the most common convention is to use "xs". Some people prefer "xsd"; some prefer to use the default namespace (which means no prefix is necessary). All XML Schema elements are in this namespace.

The root element: 
All XML Schema files must begin and end with the  markup. The schema MUST end with an  end markup.

An XML Schema defines elements and attributes which are available in a namespace (i.e. http://www.example.org/contactExample ). In the XML Schema, this namespace is defined using the  attribute.

In an XML file, a namespace can be imported using the  attribute (  stands for XML NameSpace). The  attribute name can be ended with   and a prefix (i.e.  ). In this case, the imported tags must be used with this prefix. Prefix are used to distinguish tags with same names imported from different namespaces.

You can see that the target namespace we are defining in the example is one of the namespace imported in the XML file. You can see that we are importing the namespace of the document itself with the  prefix, so that elements we are defining in the document can be used in the document itself starting with.

 is used for an ordered group of elements. For an unordered group of elements use .

Elements
Elements are defined in XML Schema using the  markup:

Elements with text body
For elements with text body, the type of the text can be defined with the  attribute:

Here are the common XML Schema primitive data types:

Some schema restrictions and facets can be defined to the data type using the  and the   markups. For instance, a body text of string type can be fixed to a length of 5 characters using the  markup as above:

Here are all the schema restrictions and facets that can be used:

Complex element
Any element type that can have sub-elements and/or attributes is considered as a complex element. Complex element types are defined using the  markup.

Sub-elements of a complex element
Sub-elements are defined with different apparition rules.

Sub-elements can be defined into a  markup: the sub-elements must all exist and can appear in any order.

Sub-elements can be defined into a  markup: the sub-elements must appear in the same order.

Sub-elements can be defined into a  markup: one and only one sub-element must appear.

Some apparition rules can be included into others apparition rules. markup and  markup can be included into a   markup or a   markup.

Number of occurrences can be changed with the  and   attributes of the   markup.

By default, the minimum occurrence is 1 and the maximum occurrence is 1. To define an infinite number of occurrence, specify.

Complex element with sub-elements and text
Complex elements can contain text in their body before, between and after their sub-elements setting the  attribute to.

By default, the  attribute is set to.

Attributes of a complex element
The attributes of an element can be defined with the  markup.

If the element contains both attributes and sub-elements, the  markups must be defined above the ,   or   markup. Data types, restrictions and facets can be defined for attributes as it is for text-body-only elements.

By default, attributes are optional. This can be changed with the  attribute.
 * If its value is, the attribute can be left.
 * If its value is, the attribute must be present.

A default value can be defined with the  attribute. If the attribute is not present, the parsers will consider the attribute is present and its value is the default value.

The attribute can be restricted to a constant value with the  attribute. As the  attribute also acts as a default value, you must not define a   attribute too.

Complex element with attributes and text body
Elements can contain both attributes and text body using the  markup (remember that a simple type element can't contain attributes).

Type definition
Simple and complex types can be defined beside the element tree. In this case, the  markup has no body, keeps its   attribute and has a   attribute. The  markup is then defined outside the root element with a   attribute containing the element type name. There is no change for the XML file validation. Let's take this XML Schema:

Now let's define Person complex type and the LastUpdate simple type beside the root element tree:

Complex and simple types can be defined in any order. A defined type can be reused in different elements of the schema and then its description is not duplicated. It avoids the XSD file to be too much indented. Moreever, using type definitions, the elements have not only a name but also a type name which can be used as a class name too. Some tools used to parse XML content according to an XML Schema can require a type name for complex type elements.

Element and attribute reference
Elements and attributes can be reused using references. In this case, the  markup or   markup has no body, no   attribute and has a   attribute. This  attribute contains the name of another element or another attribute. Let's use a reference on the Person element on the previous example:

The difference between separate type definition above and using reference is that any element or attribute can be referenced. Moreever, using reference, links are done using names instead of type names. This means the we are not linking classes but instances of classes.

Extension
Defined complex types can be reused adding sub-elements or attributes. The complex type is then extended. It can be done using the  and the   markups. The extended type name is defined in the  attribute of the   markup. Here is an example where the  complex type is extended with the   attribute for the   element:

Various elements with common and different sub-elements or attributes can be defined like that. The common items would be defined in a common complex type and the different items would be defined in different complex types extending the first one.