FOSS Open Standards/Some Important Open Standards

This section will discuss some of the more important open standards that are either currently already available or actively being developed. The standards listed here are by no means exhaustive but they do represent those that are most widely used in the industry today.

Internet Networking and Applications/Services
The Internet is what it is today mainly because of the almost universal accessibility of the applications and services offered over it and its seamless connectivity. This is a direct result of the widespread use of open standards in the implementation of the Internet, both historically and currently. The standards mainly responsible for the Internet infrastructure and for the popular World Wide Web and Internet email services are highlighted here.

Transmission Control Protocol/Internet Protocol
The TCP/IP suite of networking standards provides the foundation for the network infrastructure of the Internet. All major services and applications on the Internet ride on top of TCP/IP. These protocols were originally developed by the pioneers of the Internet, the engineers and scientists from universities, research institutions and companies who collaborated on the US Department of Defence's Advanced Research Projects Agency Network (ARPANET) project. This evolved to become the Internet as we know it today, and TCP/IP became a de facto standard. It is now an IETF Standard and IETF is charged with its continued development.

TCP/IP is a two-layered packet-switching specification in which data to be communicated between two end-points on a network is first broken up into smaller data packets that are then individually routed through the network from the source to the destination points. The higher layer, Transmission Control Protocol (TCP), manages the disassembling of the data into smaller packets at the source and the reassembling at the destination point upon receipt of the data packets. The lower layer, Internet Protocol (IP), handles the addressing and routing of each packet so that it gets to the correct destination.

TCP/IP just provides the transport mechanism for sending data across the Internet or an IP network. In order for this to be useful, a service or application has to be specified and implemented. Again, IETF is mainly responsible in overseeing and setting the specifications for most of these services. The widespread implementation and acceptance of these specifications coupled with open standards bodies like IETF and W3C make the Internet the best showcase for open standards at work. Some of these standards are listed below.

Hypertext Transfer Protocol
HTTP is perhaps the most widely used Internet service protocol. It is the primary method used to access the WWW. Web content, in the form of HTML pages and possibly also other multimedia formats, is transferred from a Web server to a user's Web accessing agent using the HTTP protocol. HTTP was developed by W3C in co-operation with IETF working groups. The standard most widely deployed and supported on the Web today is HTTP version 1.1 or HTTP/1.1.

The HTTP protocol is a request-response protocol using a client-server model in which an HTTP client, e.g. a Web browser, initiates a request by establishing a TCP connection to the server computer that will respond to the request commands sent by the client. The commands to support as well as for the behaviour of both client and server are spelt out in the HTTP specification.

It is through this universal acceptance of the HTTP protocol standard that the Web has become the ubiquitous information dissemination and exchange medium that it is now. One major factor in its wide acceptance by all the stakeholders and players on the Internet is its open standard status.

Hypertext Markup Language
While HTTP defines how the contents of a Web page can be transmitted between a Web server and a client, HTML is an open standard specifying the structure and presentation of the content. A document composed with HTML consists of the contents intermingled with symbols and tags that tell the software needed to interpret and display the HTML document structure and presentation of the content. The HTML specification is now being maintained by W3C. It has undergone several revisions and the most current specification is HTML 4.01. HTML is also available as an ISO standard, which is a subset of HTML 4.

In its simplest form, an HTML document consists of the text of the document as well as tags that specify the markup needed to be performed on it. For example, in the sample below:

&lt;h3&gt;My Work Experience&lt;/h3&gt; &lt;img src="mypic.png"&gt; &lt;p&gt; &lt;b&gt;Work Experience&lt;/b&gt; &lt;br&gt; &lt;br&gt; 1990 - 1995 System engineer&lt;br&gt; 1995 - 2005 Network manager&lt;br&gt; &lt;/p&gt;

The tags &lt;h3&gt;and &lt;/h3&gt;specify that the text enclosed within them is to be rendered as third level headings, while the tag &lt;img src="mypic.png"&gt; specifies the display of a graphics file. The tags &lt;b&gt;, &lt;/b&gt; specify that the text enclosed within them should be rendered as bold, and the tag &lt;br&gt; signifies a line break.

An HTML user agent software is needed to render a document made up of HTML and the most common agent is a Web browser. If the W3C HTML specifications are adhered to, an HTML document can be displayed properly by any user agent (which conforms to the specifications) and this can form the basis for a standard format for textual document information exchange. One major limitation of using HTML to display a document though, is that page breaks are not easily represented or controlled.

The use of HTML in email has gained popularity as it enables one to impose some simple formatting on the text as well as embed graphics and multimedia content into the message. However, it is generally considered not good practice by security-conscious users to utilize HTML in mail messages as some popular HTML-enabled email software have been known to possess vulnerabilities. This makes them open to potential exploitation by a rogue HTML email message which may result in the compromise of a user's system.

Email Protocols
Internet email has become almost as important as the telephone service. Every time we send an email, we assume that the mail will be relayed correctly by the mail server to its destination. When we send attachments or incorporate some non-textual content into our email we just assume that the attachment will be incorporated correctly and when the recipient gets it s/he will be able to get it back into its original form. All this works seamlessly irrespective of the hardware and software deployed because Internet email makes use of several important open standards in its mail transmission as well as in the encoding of email messages.

Simple Mail Transfer Protocol
SMTP enables the transport and routing of email from the sender to the recipient using their email addresses. This standard is client-server based whereby the SMTP client (usually the user's email software or mail user agent) will initiate a TCP connection to the SMTP server (the mail relay host). Communications between the server and client is done using the SMTP protocol. This is a simple text-based protocol where, essentially, the client informs the server of the email addresses of the sender and the recipient(s).

After that, if all goes well and the server allows it (based on its mail relay policy), the client will transmit the mail message to the server. The server will then attempt to deliver it to the computer housing the recipient's mailbox or, if necessary, forward the email to another server for delivery to the recipient's mailbox.

The SMTP protocol started out supporting only 7-bit ASCII (American Standard Code for Information Interchange) text in the messages, effectively limiting it to the transmission of English-based text. NonEnglish language texts that make use of more than the 7-bit ASCII character set as well as binary file attachments have to be encoded by the email user agent software before transmission. The message format of this text-based mail is specified by another IETF standard, RFC 2822. The SMTP standard has been extended to support 8-bit text, permitting the transmission without encoding of text messages in more languages.

Multipurpose Internet Mail Extensions
As Internet email became more and more popular, users found it a convenient, economical and efficient way to send information to one another. Users tried to send other types of content, e.g. audio, video, images, software programs, besides text messages via email. However, since the original Internet email specifications were meant primarily for English-based text messages, some new sets of specifications had to be drawn up to allow interoperability and seamless transmission of multipurpose content. This resulted in IETF producing the Multipurpose Internet Mail Extensions (MIME) standard.

MIME is an extension of the basic text-based Internet mail standard. It defines mechanisms for sending other kinds of information in email. These include non-English text using character sets beyond ASCII and binary file content such as multimedia files and computer software. To support these as well as to retain backward compatibility with the simple ASCII-based mail format, a set of email headers for specifying additional attributes of a message, e.g. content type, and a set of transfer encodings that can be used to represent 8-bit data using characters from the 7-bit ASCII set are defined. The encoding of non-ASCII characters in mail message headers is also catered for in MIME allowing the usage of nonEnglish characters in them. The MIME standard specifies a means to register new content types and transfer encodings making it flexible for supporting new multimedia types in the future.

MIME is also an important standard for the Web as the HTTP protocol makes use of mail-like MIME formatting rules and syntax for its data formatting.

The Extensible Markup Language
The Extensible Markup Language (XML) is a Recommendation from W3C that specifies a meta markup language (a meta language is a language for describing other languages) for the creation of other markup languages for use on the WWW. HTML is a single predefined markup language and hence possesses severe limitations to describe and represent all sorts of data for dissemination, exchange and interaction. XML, being a markup specification language, is capable of being used to design markups for describing many different kinds of data for storage, transmission, or processing by a program. It describes the data but it does not tell you what you should do with the data.

One should note that XML and HTML were designed with different goals in that XML was designed to store, carry, and exchange data whereas HTML was designed to display data and to focus on how data looks. XML was created for deployment on the Web by using a subset of an existing, widely used international standard for text document markup - the Standard Generalized Markup Language (SGML).

Due to its design goals, XML is well suited for data transfer and exchange and as a format for document storage and processing. This and the fact that it is under the charge of an open specifications/standards body, W3C, has resulted in XML being used as the base for specifying many other data formats and exchange protocols. According to the community-based XML portal, XML.ORG, it is now viewed as the standard way for information exchange in environments that do not share common platforms.

Special purpose languages and standards developed using XML for specific environments or activities are announced almost daily and several hundred have been adopted since XML 1.0 was released in February 1998. In particular, the e-government and e-commerce segments are very active in developing and implementing XML-based specifications.

A simple XML document is shown below:

&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt; &lt;?xml-stylesheet type="text/xsl" href="bookcollection.xsl"?&gt; &lt;BookCollection&gt; &lt;Book&gt; &lt;Title&gt;Chronicles: Volume One&lt;/Title&gt; &lt;Author&gt;Bob Dylan&lt;/Author&gt; &lt;Publisher&gt;Simon and Schuster&lt;/Publisher&gt; &lt;Year&gt;2004&lt;/Year&gt; &lt;/Book&gt; &lt;Book&gt; &lt;Title&gt;Harry Potter and the Goblet of Fire&lt;/Title&gt; &lt;Author&gt;J.K. Rowling&lt;/Author&gt; &lt;Publisher&gt;Bloomsbury Publishing&lt;/Publisher&gt; &lt;Year&gt;2000&lt;/Year&gt; &lt;/Book&gt; &lt;/BookCollection&gt;

Note that while XML uses syntax tags to identify various types of data in a document file, these tags are not predefined. So the document creator has to define and describe them using what is called an XML schema and associate the document with the schema. To create the schema, an XML schema language is used, e.g. Document Type Definition (DTD), XML Schema and RELAX NG. The purpose of the schema is to define the legal building blocks of the XML document, i.e. the elements, data attributes, tags, etc., that can appear in the document. DTD has limitations with respect to its extensibility and lack of support of several useful features, e.g. data types and namespaces. XML Schema, which is also another W3C Recommendation, is more suitable for use in many practical Web applications.

While the schema may define the legal components of the XML document, it does not carry information about how to display the data. So in order for the data in an XML document to be displayed properly by say, a Web browser, a display style has to be specified. The Extensible Stylesheet Language (XSL) is used to perform this. Styling is about transforming and formatting information and the W3C specifications separate these processes. In addition, the components in an XML document have to be navigated to extract and process them. Hence, the XSL Recommendation from W3C consists of three parts:
 * 1) XSL Transformations (XSLT): a language for transforming XML documents
 * 2) XSL Formatting Objects (XSL-FO): a language for formatting XML documents
 * 3) XML Path Language (XPath): a language for navigating in XML documents

An example of an XSLT transformation of the XML example document above to a Web browser displayable HTML output is:

&lt;?xml version="1.0" encoding="ISO-8859-1"?&gt; &lt;xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt; &lt;xsl:template match="/"&gt; &lt;html&gt; &lt;body&gt; &lt;h3&gt;Book Collection&lt;/h3&gt; &lt;table&gt; &lt;tr bgcolor="#ff0000"&gt; &lt;th align="center"&gt;Title&lt;/th&gt; &lt;th align="center"&gt;Author&lt;/th&gt; &lt;th align="center"&gt;Publisher&lt;/th&gt; &lt;th align="center"&gt;Year&lt;/th&gt; &lt;/tr&gt; &lt;xsl:for-each select="BookCollection/Book"&gt; &lt;tr&gt; &lt;td&gt;&lt;xsl:value-of select="Title"/&gt;&lt;/td&gt; &lt;td&gt;&lt;xsl:value-of select="Author"/&gt;&lt;/td&gt; &lt;td&gt;&lt;xsl:value-of select="Publisher"/&gt;&lt;/td&gt; &lt;td&gt;&lt;xsl:value-of select="Year"/&gt;&lt;/td&gt; &lt;/tr&gt; &lt;/xsl:for-each&gt; &lt;/table&gt; &lt;/body&gt; &lt;/html&gt; &lt;/xsl:template&gt; &lt;/xsl:stylesheet&gt;

Computer Graphics and Multimedia
In the old days of computing, the display was predominantly text-based and any graphics displayed was, at best, line-graphics implemented using special line drawing character sets. Computer terminals that can display full-fledged graphics were expensive and used only for special purposes or applications. Today, with the proliferation of inexpensive personal computers that have the power to process and display graphics and multimedia, even the user interface is graphics-based. One of the main attractions of the Web is its widespread support and usage of graphic images and multimedia to make the content interesting and lively.

It is important that open standards are followed as much as possible in graphics and multimedia data storage, processing and retrieval to enable diverse devices and computing platforms to offer the same degree of Web experience.

Portable Network Graphics
In the early days of the Web when Internet links and connections were relatively slow, many simple images and animations displayed in Web pages made use of a graphics format called Graphics Interchange Format (GIF) as this format resulted in small graphic file sizes. The GIF format included the use of the Lempel-Ziv-Welch (LZW) compression algorithm that was patented in the USA by Unisys who eventually decided to ask for royalty payments for all software that utilize GIF. This led to the creation of the Portable Network Graphics (PNG) format to replace GIF for use as a single-image Web format. The PNG format later became a W3C Recommendation as well as an ISO international standard (ISO/IEC 15948).

PNG is an extensible file format for the lossless, portable, well-compressed storage of raster images. Indexed-colour, greyscale, and true colour images are supported, plus an optional alpha channel for transparency. It is fully streamable with a progressive display option making it useful for online graphics display in Web pages. It also boasts robust features, providing both full file integrity checking and simple detection of common transmission errors.

The X Window System
The graphics user interface (GUI) that is now common on desktop computers uses a graphical window metaphor as the basic user interface. This window system GUI enables different programs to run simultaneously in their own individual windows and these windows can be opened, closed and resized. The windowing systems found on platforms like Microsoft Windows and Mac OS X are proprietary ones. On the other hand, UNIX and UNIX-like operating systems (e.g. GNU/Linux, FreeBSD) make use of an open window system - the X Window System.

The X Window System, or X, is an open windowing system standard led by the X.Org Foundation. X provides a framework for the display and management of graphical information and on top of this a GUI may be built. X uses a client-server model. The X client is usually the application that sends graphical output for display on the X server. The X server interacts with the user using primarily the keyboard and mouse as input devices and the input is transmitted to the client to act upon.

The X client and server may be running on the same machine or they may be on different physical devices connected together over a network. The intrinsic client-server property of X constitutes the main difference between it and other well known window systems like Microsoft Windows, which simply displays graphical applications local to the device on which the application is running on.

Being an open standard, besides UNIX and UNIX-like systems, X has been implemented on a variety of hardware and operating systems, including the various generations of Macintoshes, PCs running MS-DOS and Microsoft Windows as well as OpenVMS from Hewlett-Packard (formerly Digital Equipment), etc.

Ogg Vorbis
Ogg Vorbis is a general-purpose compressed audio format for storing and playing digital music. It is comparable in quality to other formats such as the popular MP3. However, unlike MP3, it is an open format and it claims to be free from patents. The format originated from the Xiph.Org Foundation, a non-profit organization dedicated to producing free and open protocols, formats and software for multimedia.

Vorbis is the name of the audio compression scheme and this is contained in Ogg, the name of Xiph.Org's container format for audio, video, and meta-data - hence the name Ogg Vorbis. Vorbis is a lossy codec, i.e., it uses a compression algorithm that discards data in order to increase the compression possible. Ogg is a container also for other formats, including: FLAC (lossless audio), Speex (speech) and Theora (video). The specification for Ogg, Vorbis and these other formats are in the public domain and are completely free for commercial or non-commercial use.

Software and hardware devices that support Ogg Vorbis are steadily increasing in number and may be found on the Vorbis wiki at Xiph.Org.

Office Documents
Office applications are some of the most widely used applications for personal computers in a modernday office. These applications include word-processing, spreadsheets and presentation software. Available on the market are several office applications, e.g. Microsoft Office, WordPerfect Office, OpenOffice.org and Applixware Office. Each of these invariably used different formats for storing their files in the past.

As a result, it was difficult to convert from one file format to another and for one application to read/write a file created by another application. It is thus a real step forward in terms of office interoperability and productivity when OASIS announced that it was recommending the Open Document Format for Office Applications as a standard file format use in office applications.

OpenDocument
OpenDocument is a file format developed by OASIS for storing office documents created by applications such as spreadsheets, word processors, charts and presentations software. It makes use of a royalty-free, open and vendor-independent XML-based format. The format is based on the file format of OpenOffice.org, which was submitted to OASIS to form the basis for the standard. OpenDocument provides a single XML schema for text processing, spreadsheet, presentation, drawing, charting, and mathematical documents. The OpenDocument format has since become an ISO/IEC international standard (ISO/IEC 26300).

Office software which have announced that they will support the OpenDocument format as their primary/native format include the office suites of OpenOffice.org, StarOffice and KOffice.

Open Standards Usage
Table 1 summarizes the usage and penetration levels of the open standards described above in their respective domains. As can be seen, open standards are widely deployed on the Internet and in running Internet-related/derived services and applications. However, for the graphics, multimedia and office applications areas they are still very limited in acceptance. The limited penetration in these domains is a result of the fact that they are dominated by proprietary products like those from Apple and Microsoft that make use of their own proprietary formats and specifications (see the next section on "Comparison of File Formats"). The incentive for these vendors to support open standards or at least make their specifications more open is not strong due to their dominant positions.