ETD Guide/Technical Issues/DTDs for ETDs

(This section was taken from an article by P. Potter, P. Strabala, D.Dobratz, M. Schulz about ETDs, that is due to appear in "The Internet and Higher education" 4/2001)

XML Authoring Systems

The fact that currently available authoring systems for XML still have not won wide recognition has led to different strategies at different universities regarding XML documents. Most of these projects were started between 1995 and 1997, in a time when XML was alive, but where tools or standardized DTDs were barely available. A view of those projects from today’s perspective illustrates the demand for a rethinking and redesign of those approaches in order to come to a standardization.

DTDs

All the presented DTDs are built upon similar principles. A classical dissertation (which can be seen as monograph) consists of 3 main components: an extensible titlepage with abstracts, declarations, etc., the dissertation corpus, which includes text, pictures, audio, video, tables and so on, and appendices, which contain data sheets, bibliographies, acknowledgements and others.

The following DTDs are currently in use at different institutions:
 * ETD-ML.DTD: Virginia Polytechnic Institute and State University (Virginia Tech)
 * DiML.DTD: German Dissertationen Online Projectes
 * UIowa2K.DTD: University of Iowa
 * HutPubl.DTD: Technical University Helsinki
 * TEI-Light.DTD: Ann Arbor und Lyon
 * ISOBook.DTD: University of Oslo
 * TEI-based DTD with extensions for natural sciences: Swedish University of Agricultural Sciences Uppsala

Author-DTDs

All these Document Type Definitions are so-called author-DTDs. This means that they are primarily used to support the authoring and the conversion process and do not primarily address document archiving and preservation. One may ask why all those different DTDs have prevailed. This is mainly because the scientific orientation of the mentioned universities is quite varied. Lyon, Oslo and Michigan, which use TEI-Light.dtd, mainly serve students in the arts and humanities. Problems using TEI.DTD or DocBook.DTD are recognized at universities that support a strong natural science community, such as Berlin, Helsinki or Uppsala. Often a dissertation is a cumulative work, e.g., in Lyon or Helsinki.

Université Laval, in collaboration with the Université de Montréal, is working during 2001-2002 on the modelisation of a new DTD for ETD. The DTD and its documentation will be post at http://www.theses.umontreal.ca.

DTDs for multimedia content

"Structured data," such as mathematical or chemical formulas, spreadsheets, address books, configuration parameters, financial transactions, technical drawings, etc., are usually published on the Web using layout programs such as Postscript or PDF, or by putting them into graphic formats like gif, jpeg, png, vrml, and so on. Programs that produce such data often also store it on disk, using either a binary or text format. Therefore, if someone wants to look at the data, he usually needs the program that produced it. With XML, data could be stored in text format, which allows the user to read the file without having the original program. XML can be thought of as a set of rules, guidelines, or conventions, for designing text formats for data in a way that produces files that are easy to generate and read (by computer). In addition to the older standard SGML, there are several emerging standards that use XML encoding to overcome the disadvantages common to web publishing in HTML. The following sections give an overview of standards that have been established during the last few years or which are still works in progress, but widely recognized.

XML DTDs and Schemas

For standardized knowledge management this variety of XML DTDs and Schemas seems confusing. A closer look, though, gives another perspective: every scientific subject defines and uses its own standards. The following document type definitions can roughly be classified in:
 * 1) Schemata that use semantic tags to mark real content items, e.g., MathML or CML.
 * 2) Schemata that are used for visualisation and layout purposes and to control the browser synchronization, e.g., HTML, SVG (Scalable Vector Graphics), SMIL(Synchronized Multimedia Integration Language).
 * 3) Schemata that are principally designed to perform the exchange of data with huge databases, e.g., cXML(commercial XML).

Electronic Publishing

Within the field of "Electronic Publishing," these developments have led to new opportunities to structure scientific information, not just text-based but also so-called active contents and multimedia elements. This brings the whole field to a new level of information processing or knowledge management. The different approaches for electronic publishing at universities creates a very heterogeneous environment. The following tables show how difficult it might be to subsume all those different models under one concept in order to achieve valuable and searchable information systems based on XML. Crosswalks between all those DTDs have to be defined in order to build a distributed retrieval engine, capable of searching within internal document structures "throughout the world." Not only different DTDs are used, but also different strategies to perform a conversion from usual text formatting systems into highly structured documents in SGML or XML.

Next Section: Berlin DTD workshop