XML - Managing Data Exchange/A single entity

Introduction
In this chapter, we start to practice working with XML using XML documents, schemas, and stylesheets. An XML document organizes data and information in a structured, hierarchical format. An XML schema provides standards and rules for the structure of a given XML document. An XML schema also enables data transfer. An XSL (XML stylesheet) allows unique presentations of the material found within an XML document.

In the first chapter, Introduction to XML, you learned what XML is, why it is useful, and how it is used. So, now you want to create your very own XML documents. In this chapter, we will show you the basic components used to create an XML document. This chapter is the foundation for all subsequent chapters--it is a little lengthy, but don't be intimidated. We will take you through the fundamentals of XML documents.

This chapter is divided into three parts:
 * XML Document
 * XML Schema
 * XML Stylesheets (XSL)

As you learned in the previous chapter, the XML Schema and Stylesheet are essentially specialized XML Documents. Within each of these three parts we will examine the layout and components required to create the document. There are links at the end of the XML document, schema, and stylesheet sections that show you how to create the documents using an XML editor. At the bottom of the page there is a link to Exercises for this chapter and a link to the Answers.

The first thing you will need before starting to create XML documents is a problem--something you want to solve by using XML to store and share data or information. You need some entity you can collect information about and then access in a variety of formats. So, we created one for you.

To develop an XML document and schema, start with a data model depicting the reality of the actual data that is exchanged. Once a high fidelity model has been created, the data model can be readily converted to an XML document and schema. In this chapter, we start with a very simple situation and in successive chapters extend the complexity to teach you more features of XML.

Our starting point is a single entity, CITY, which is shown in the following figure. While our focus is on this single entity, to map CITY to an XML schema, we need to have an entity that contains CITY. In this case, we have created TOURGUIDE. Think of a TOURGUIDE as containing many cities, and in this case TOURGUIDE has no attributes nor an identifier. It is just a container for data about cities.

Exhibit 1: Data model - Tourguide

XML document
An XML document is a file containing XML code and syntax. XML documents have an .xml file extension.

We will examine the features & components of the XML document.


 * Prologue (XML Declaration)
 * Elements
 * Attributes
 * Rules to follow
 * Well-formed & Valid XML documents

Below is a sample XML document using our TourGuide model. We will refer to it as we describe the parts of an XML document.

Exhibit 2: XML document for city entity

Prologue (XML declaration)
The XML document starts off with the prologue. The prologue informs both a reader and the computer of certain specifications that make the document XML compliant. The first line is the XML declaration (and the only line in this basic XML document).

Exhibit 3: XML document - prologue

xml  =   this is an XML document

version="1.0"  =   the XML version (XML 1.0 is the W3C-recommended version)

encoding="UTF-8"  =   the character encoding used in the document - UTF 8 corresponds to 8-bit encoded Unicode characters (i.e. the standard way to encode international documents) - Unicode provides a unique number for every character.

Another potential attribute of the XML declaration:

standalone="yes"  =   the dependency of the document ('yes' indicates that the document does not require another document to complete content)

Elements
The majority of what you see in the XML document consists of XML elements. Elements are identified by their tags that open with < or  or />. The start tag looks like this:, with a left angle bracket (<) followed by the element type name, optional attributes, and finally a right angle bracket (>). The end tag looks like this:, similar to the start tag, but with a slash (/) between the left angle bracket and the element type name, and no attributes.

When there's nothing between a start tag and an end tag, XML allows you to combine them into an empty element tag, which can include everything a start tag can: . This one tag must be closed with a slash and right angle bracket (/>), so that it can be distinguished from a start tag.

The XML document is designed around a major theme, an umbrella concept covering all other items and subjects; this theme is analyzed to determine its component parts, creating categories and subcategories. The major theme and its component parts are described by elements. In our sample XML document, 'tourGuide' is the major theme; 'city' is a category; 'population' is a subcategory of 'city'; and the hierarchy may be carried even further: 'males' and 'females' could be subcategories of 'population'. Elements follow several rules of syntax that will be described in the Rules to Follow section.

We left out the attributes within the  start tag &mdash; that part will be explained in the XML Schema section.

Exhibit 4: Elements of the city entity XML document

Element hierarchy

 * root element -   This is the XML document's major theme element. Every document must have exactly one and only one root element. All other elements are contained within this one root element. The root element follows the XML declaration. In our example,  is the root element.


 * parent element -   This is any element that contains other elements, the child elements. In our example, is a parent element.


 * child element -   This is any element that is contained within another element, the parent element. In our example, is a child element of.


 * sibling element -   These are elements that share the same parent element. In our example, , ,, , , , , , , and are all sibling elements.

Attributes
Attributes aid in modifying the content of a given element by providing additional or required information. They are contained within the element's opening tag. In our sample XML document code we could have taken advantage of attributes to specify the unit of measure used to determine the area and the elevation (it could be feet, yards, meters, kilometers, etc.); in this case, we could have called the attribute 'measureUnit' and defined it within the opening tag of 'area' and 'elevation'.

Cayo

Selangor

The above attribute example can also be written as:

1. using child elements

2. using an empty element





Attributes can be used to:


 * provide more information that is not defined in the data
 * define a characteristic of the element (size, color, style)
 * ensure the inclusion of information about an element in all instances

Attributes can, however, be a bit more difficult to manipulate and they have some constraints. Consider using a child element if you need more freedom.

Rules to follow
These rules are designed to aid the computer reading your XML document.


 * The first line of an XML document must be the XML declaration (the prologue).


 * The main theme of the XML document is established in the root element and all other elements must be contained within the opening and closing tags of this root element.

(e.g. data stuff ).
 * Every element must have an opening tag and a closing tag - no exceptions

=> the parent element's opening and closing tags must contain all of its child elements' tags; in this way, you close first the tag that was opened last:  data  <subChildElementA>data</subChildElementA> <subChildElementB>data</subChildElementB> </childElement2> <childElement3>data</childElement3> </parentElement>
 * Tags must be nested in a particular order


 * Attribute values should have quotation marks around them and no spaces.


 * Empty tags or empty elements must have a space and a slash (/) at the end of the tag.


 * Comments in the XML language begin with "".

XML Element Naming Convention
Any name can be used but the idea is to make names meaningful to those who might read the document.


 * XML elements may only start with either a letter or an underscore character.


 * The name must not start with the string "xml" which is reserved for the XML specification.


 * The name may not contain spaces.


 * The ":" should not be used in element names because it is reserved to be used for namespaces (This will be covered in more detail in a later chapter).


 * The name may contain a mixture of letters, numbers, or other characters.

XML documents often have a corresponding database. The database will contain fields which correspond to elements in the XML document. A good practice is to use the naming rules of your database for the elements in the XML documents.

Simple Internal DTD
Every element that will be used MUST be included in the DTD. Don’t forget to include the root element, even though you have already specified it at the beginning of the DTD. You must specify it again, in an <!ELEMENT> tag. <!ELEMENT cdCollection (cd)> The root element, <cdCollection>, contains all the other elements of the document, but only one direct child element: <cd>. Therefore, you need to specify the child element (only direct child elements need to be specified) in the parentheses. <!ELEMENT cd (title, artist, year)> With this line, we define the <cd> element. Note that this element contains the child elements, , and. These are spelled out in a particular order. This order must be followed when creating the XML document. If you change the order of the elements (with this particular DTD), the document won’t validate. <!ELEMENT title (#PCDATA)> The remaining three tags,, , and don’t actually contain other tags. They do however contain some text that needs to be parsed. You may remember from an earlier lecture that this data is called Parsed Character Data, or #PCDATA. Therefore, #PCDATA is specified in the parentheses. So this simple DTD outlines exactly what you see here in the XML file. Nothing can be added or taken away, as long as we stick to this DTD. The only thing you can change is the #PCDATA text part between the tags.

Adding complexity
There may be times when you will want to put more than just character data, or more than just child elements into a particular element. This is referred to as mixed content. For example, let’s say you want to be able to put character data OR a child element, such as the <b> tag into a element:

<!ELEMENT description (#PCDATA | b | i )*>

This particular arrangement allows us to use PCDATA, the <b> tag, or the <i> tag all at once. One particular caveat though, is that if you are going to mix PCDATA and other elements, the grouping must be followed by the asterisk (*) suffix. This declaration allows us to now add the following to the XML document (after defining the individual elements of course)

With attributes this is done a little differently than with elements. Please see following example:

In order for this to validate, it must be specified in the DTD. Attribute content models are specified with:

<!ATTLIST element_name attribute_name attribute_type default_value>

Let’s use this to validate our CD example:

<!ATTLIST cd remaster_date CDATA #IMPLIED>

Choices
<ATTLIST person gender (male|female) “male”>

Grouping Attributes for an Element
If a particular element is to have many different attributes, group them together like so:

<!ATTLIST car horn CDATA #REQUIRED seats CDATA #REQUIRED steeringwheel CDATA #REQUIRED price CDATA #IMPLIED>

Adding STATIC validation, for items that must have a certain value
<!ATTLIST classList  classNumber CDATA #IMPLIED building (UWINNIPEG_DCE|UWINNIPEG_MAIN) "UWINNIPEG_MAIN" originalDeveloper CDATA #FIXED "Khal Shariff">

Suffixes
So what happens with our last example with the CD collection, when we want to add more CDs? With the current DTD, we cannot add any more CDs without getting an error. Try it and see. When you specify a child element (or elements) the way we did, only one of each child element can be used. Not very suitable for a CD collection is it? We can use something called suffixes to add functionality to the <!ELEMENT> tag. Suffixes are added to the end of the specified child element(s). There are 3 main suffixes that can be used:


 * ( No suffix ): Only 1 child can be used.
 * ( + ): One or more elements can be used.
 * ( * ): Zero or more elements can be used.
 * ( ? ): Zero or one element may be used.

Validating for multiple children with a DTD
So in the case of our CD collection XML file, we can add more CDs to the list by adding a + suffix: <!ELEMENT cd_collection(cd+)>

Using more internal formatting tags
Bold tags, B's for example are also defined in the DTD as elements, that are optional like thus:

<ELEMENT notes (#PCDATA | b | i)*> <!ELEMENT b (#PCDATA)*> <!ELEMENT i (#PCDATA)*> ]> _______________ <classList classNumber="303" building="UWINNIPEG_DCE" originalDeveloper="Khal Shariff"> <firstName>Kenneth </firstName> <lastName>Branaugh </lastName> <studentNumber> </studentNumber> Excellent , Kenneth is doing well. etc

{| width="100%" border="1"
 * - style="horizontal-align:wraptext;"
 * style="background-color:lightyellow" |

Case Study on BMEcat
One of the first major national projects for the use of XML as a B2B exchange format was initiated by the federal association for material management, purchasing and logistics (BME) in cooperation with leading German companies, e.g. Bayer, BMW, SAP and Siemens. They all created a standard for the exchange of product catalogues. This project was named [http://www.bmecat.org/English/index.asp? BMEcat]. The result of this initiative is a DTD collection for the description of product catalogues and related transactions (new catalogue, updating of product data and updating of prices).

Companies operating in the electronic commerce (suppliers, purchasing companies and market places) exchange increasingly large amounts of data. They quickly reach their limits here by the variety of data exchange formats. The BMEcat solution creates a basis for a straightforward transfer of catalogue data from various data formats. This lays the foundation to bringing forward the goods traffic through the Internet in Germany. The use of the BMEcat reduces the costs for all parties as standard interfaces can be used.

The XML-based standard BMEcat was successfully implemented in many projects. Nowadays a variety of companies applies BMEcat and use it for the exchange of their product catalogs in this established standard.

A BMEcat catalogue (Version 1.2) consists of the following main elements:

CATALOG This element contains the essential information of a shopping catalog, e.g. language version and validity. BMEcat expects exactly one language per catalog.

SUPPLIER This element includes identification and address of the catalog suppliers. BMEcat expects exactly one supplier per catalog.

BUYER This element contains the name and address of the catalogue recipient. BMEcat expects no more than one recipient per catalog.

AGREEMENT This element contains one or more framework agreement IDs associated with the appropriate validity period. BMEcat expects all prices of a catalogue belonging to the contract mentioned above.

CLASSIFICATION SYSTEM This element allows the full transfer of one or more classification systems, including feature definitions and key words.

CATALOG GROUP SYSTEM This element originates from version 1.0. It is mainly used for the transfer of tree-structures which facilitate the navigation of a user in the target system (Browser).

ARTICLE (since 2005 PRODUCT) This element represents a product. It contains a set of standard attributes.

ARTICLE PRICE (since 2005 PRODUCT PRICE) This element represents a price. The support of different pricing models is very powerful in comparison with other exchange formats. Season prices, country prices, different currencies and different validity periods, etc. will be supported.

ARTICLE FEATURE (since 2005 PRODUCT FEATURE) This element allows the transfer of characteristic values. You can either record predefined group characteristics or individual product characteristics.

VARIANT This element allows listing of product variants, without having to duplicate them. However, the variations of BMEcat only apply to individual changes in value, leading to a change of Article ID. Otherwise there can’t exist any dependences on other attributes (especially at prices).

MIME This element includes any number of additional documents such as product images, data sheets, or websites.

ARTICLE REFERENCE (since 2005 REFERENCE PRODUCT) This element allows cross-referencing between articles within a catalogue as well as between catalogues. These references may used restrictedly for mapping product bundles.

USER DEFINED EXTENSION This element enables transportation of data at the outside the BMEcat standards. The transmitter and receiver have to be coordinated.

You can find a typical BMEcat file here.
 * }

ONLINE Validator
GIYBF

Well-formed and valid XML
Well-formed XML -  An XML document that correctly abides by the rules of XML syntax.

Valid XML -  An XML document that adheres to the rules of an XML schema (which we will discuss shortly). To be valid an XML document must first be well-formed.

A Valid XML Document must be Well-formed. But, a Well-formed XML Document might not be valid - in other words, a well-formed XML document, that meets the criteria for XML syntax, might not meet the criteria for the XML schema, and will therefore be invalid.

For example, think of the situation where your XML document contains the following (for this schema):

<cityName>Boston</cityName> United States <adminUnit>Massachusetts</adminUnit> :  :   :

Notice that the elements do not appear in the correct sequence according to the schema (cityName, adminUnit, country). The XML document can be validated (using validation software) against its declared schema – the validation software would then catch the out of sequence error.

Using an XML Editor
Check chapter ../XML Editor/ for instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML document and paste it into the XML editor. Then check your results. Is the XML document well-formed? Is the XML document valid? (you will need to have copied and pasted the schema in order to validate - we will look at schemas next)

XML schema
An XML schema is an XML document. XML schemas have an .xsd file extension.

An XML schema is used to govern the structure and content of an XML document by providing a template for XML documents to follow in order to be valid. It is a guide for how to structure your XML document as well as indicating your XML document's components (elements and attributes - and their relationships). An XML editor will examine an XML document to ensure that it conforms to the specifications of the XML schema it is written against - to ensure it is valid.

XML schemas engender confidence in data transfer. With schemas, the receiver of data can feel confident that the data conforms to expectations. The sender and the receiver have a mutual understanding of what the data represent.

Because an XML schema is an XML document, you use the same language - standard XML markup syntax - with elements and attributes specific to schemas.

A schema defines:


 * the structure of the document
 * the elements
 * the attributes
 * the child elements
 * the number of child elements
 * the order of elements
 * the names and contents of all elements
 * the data type for each element

For more detailed information on XML schemas and reference lists of: Common XML Schema Primitive Data Types, Summary of XML Schema Elements, Schema Restrictions and Facets for data types, and Instance Document Attributes, click on this wikibook link => http://en.wikibooks.org/wiki/XML_Schema

Schema reference
This is the part of the XML Document that references an XML Schema:

Exhibit 5: XML document's schema reference

This is the part we left out when we described the root element in the basic XML document from the previous section. The additional attributes of the root element <tourGuide> reference the XML schema (it is the schemaLocation attribute).

xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'  -  references the W3C Schema-instance namespace

xsi:noNamespaceSchemaLocation='city.xsd'  -  references the XML schema document (city.xsd)

Schema document
Below is a sample XML schema using our TourGuide model. We will refer to it as we describe the parts of an XML schema.

Exhibit 6: XML schema document for city entity

Prolog
Remember that the XML schema is essentially an XML document and therefore must begin with the prolog, which in the case of a schema includes:

The XML declaration:
 * the XML declaration
 * the schema element declaration

The schema element declaration:

The schema element is similar to a root element - it contains all other elements in the schema.

Attributes of the schema element include:

xmlns -  XML NameSpace - the URL for the site that describes the XML elements and data types used in the schema.

You can find more about namespaces here => ../Namespace/.

xmlns:xsd -  All the elements and attributes with the 'xsd' prefix adhere to the vocabulary designated in the given namespace.

elementFormDefault -  elements from the target namespace are either required or not required to be qualified with the namespace prefix. This is mostly useful when more than one namespace is referenced. In this case, 'elementFormDefault' must be qualified, because you must indicate which namespace you are using for each element. If you are referencing only one namespace, then 'elementFormDefault' can be unqualified. Perhaps, using qualified as the default is most prudent, this way you do not accidentally forget to indicate which namespace you are referencing.

Element declarations
Define the elements in the schema.

Include:


 * the element name
 * the element data type (optional)

Basic element declaration format:

Simple type
declares elements that:


 * do NOT have Child Elements
 * do NOT have Attributes

example:

Default Value

If an element is not assigned a value then the default value is assigned.

example:

Fixed Value

An attribute that is defined as fixed must be empty or contained the specified fixed value. No other values are allowed.

example:

Complex type
declares elements that:


 * can have Child Elements
 * can have Attributes

examples:

1. The root element 'tourGuide' contains a child element 'city'. This is shown here:

Nameless complex type

Occurrence Indicators:


 * minOccurs = the minimum number of times an element can occur (here it is 1 time)
 * maxOccurs = the maximum number of times an element can occur (here it is an unlimited number of times, 'unbounded')

2. The parent element 'city' contains many child elements: 'cityName', 'adminUnit', 'country', 'population', etc. Why does this complex element set not start with the line: ? The element 'city' was already defined above within the complex element 'tourGuide' and it was given the type, 'cityDetails'. This data type, 'cityDetails', is utilized here in identifying the sequence of child elements for the parent element 'city'.

Named Complex Type - and therefore can be reused in other parts of the schema

The <xsd:sequence> tag indicates that the child elements must appear in the order, the sequence, specified here.

Compare the sample XML Schema and the sample XML Document - try to observe patterns in the code and how the XML Schema sets up the XML Document.

3. Elements that have attributes are also designated as complex type.

a. this XML Document line: would be defined in the XML Schema as: b. this XML Document line: would be defined in the XML Schema as:

Attribute declarations
Attribute declarations are used in complex type definitions. We saw some attribute declarations in the third example of the Complex Type Element.

Data type declarations
These are contained within element and attribute declarations as:  type=" " .

Common XML Schema Data Types

XML schema has a lot of built-in data types. The most common types are:

For an entire list of built-in simple data types see http://www.w3.org/TR/xmlschema-2/#built-in-datatypes

Using an XML Editor => ../XML Editor/

This link will take you to instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML schema document and paste it into the XML editor. Then check your results. Is the XML schema well-formed? Is the XML schema valid?

XML stylesheet (XSL)
An XML Stylesheet is an XML Document. XML Stylesheets have an .xsl file extension.

The eXtensible Stylesheet Language (XSL) provides a means to transform and format the contents of an XML document for display. Since an XML document does not contain tags a browser understands, such as HTML tags, browsers cannot present the data without a stylesheet that contains the presentation information. By separating the data and the presentation logic, XSL allows people to view the data according to their different needs and preferences.

The XSL Transformation Language (XSLT) is used to transform an XML document from one form to another, such as creating an HTML document to be viewed in a browser. An XSLT stylesheet consists of a set of formatting instructions that dictate how the contents of an XML document will be displayed in a browser, with much the same effect as Cascading Stylesheets (CSS) do for HTML. Multiple views of the same data can be created using different stylesheets. The output of a stylesheet is not restricted to a browser.

During the transformation process, XSLT analyzes the XML document and converts it into a node tree – a hierarchical representation of the entire XML document. Each node represents a piece of the XML document, such as an element, attribute or some text content. The XSL stylesheet contains predefined “templates” that contain instructions on what to do with the nodes. XSLT will use the match attribute to relate XML element nodes to the templates, and transform them into the resulting document.

Exhibit 7: XML stylesheet document for city entity

The output of the city.xsl stylesheet in Table 2-3 will look like the following:

You will notice that the stylesheet consists of HTML to inform the media tool (a web browser) of the presentation design. If you do not already know HTML this may seem a little confusing. Online resources such as the W3Schools tutorials can help with the basic understanding you will need =>(http://www.w3schools.com/html/default.asp).

Incorporated within the HTML is the XML that supplies the data, the information, contained within our XML document. The XML of the stylesheet indicates what information will be displayed and how. So, the HTML constructs a display and the XML plugs in values within that display. XSL is the tool that transforms the information into presentational form, but at the same time keeps the meaning of the data.

Prolog

 * the XML declaration;
 * the stylesheet declaration;
 * the namespace declaration;
 * the output document format.

The XML declaration

The stylesheet & namespace declarations
 * identifies the document as an XSL style sheet;
 * identifies the version number;
 * refers to the W3C XSL namespace - the URL for the site that describes the XML elements and data types used in the schema. You can find more about namespaces here => ../Namespace/.  Every time the xsl: prefix is used it references the given namespace.

The output document format This element designates the format of the output document and must be a child element of

Templates
The <xsl:template> element is used to create templates that describe how to display elements and their content. Above, in the XSL introduction, we mentioned that XSL breaks up the XML document into nodes and works on individual nodes. This is done with templates. Each template within an XSL describes a single node. To identify which node a given template is describing, use the 'match' attribute. The value given to the 'match' attribute is called a pattern. Remember:  (node tree – a hierarchical representation of the entire XML document.  Each node represents a piece of the XML document, such as an element, attribute or some text content). Wherever there is branching in the node tree, there is a node. <xsl:template> defines the start of a template and contains rules to apply when a specified node is matched.

the match attribute

<xsl:template match="/">

This template match attribute associates the XML document root (/), the whole branch of the XML source document, with the HTML document root. Contained within this template element is the typical HTML markup found at the beginning of any HTML document. This HTML is written to the output. The XSL looks for the root match and then outputs the HTML, which the browser understands.

<xsl:template match="tourGuide"> This template match attribute associates the element 'tourGuide' with the display rules described within this element.

Elements
Elements specific to XSL:

For more XSL elements => http://www.w3schools.com/xsl/xsl_w3celementref.asp.

PHP Methods of XML Dom Validation
Using the DOM DocumentObjectModel to validate XML and with a DTD DocumentTypeDeclaration and the PHP language on a server and more http://wiki.cc/php/Dom_validation

Browser Methods
Place this line of code in your .xml document after the XML declaration (prologue).

PHP Methods of XSLT Transformation
This one is good for PHP5 and wampserver (latest). Please ensure that *xsl* is NOT commented out in the php.ini file.

Example 1, Using within PHP itself (use phpInfo function to check XSLT extension; enable if needed) This example might produce XHTML. Please note it could produce anything defined by the XSL. Example 2:

XML Colors
For use in your stylesheet: these colors can be used for both background and font

http://www.w3schools.com/html/html_colors.asp

http://www.w3schools.com/html/html_colorsfull.asp

http://www.w3schools.com/html/html_colornames.asp

--- Using an XML Editor => ../XML Editor/

This link will take you to instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML stylesheet document and paste it into the XML editor. Then check your results. Is the XML stylesheet well-formed? ---

/Definitions/
XML SGML Dan Connelly RSS XML Declaration parent child sibling element attribute

*Well-formed XML

PCDATA

/Exercises/
Exercise 1.

a)Using "tourguide" above as a good example, create an XML document whose root is "classlist" . This CLASSLIST is created from a starting point of single entity, STUDENT. Any number of students contain elements: firstname, lastname, emailaddress.