W3C, SGML, XML, ODA, HTML, DSSSL and CSS:
A Guide to the Alphabet Soup of the Online Document

Henry S. Thompson
Language Technology Group,
Human Communication Research Centre
University of Edinburgh

Outline draft, 10/12/96


Introduction

The ordinary online document and the web document are on a collision course, and two working groups of the World Wide Web Consortium (W3C) are in the thick of it, trying to balance the needs of document producers and consumers while trying to keep Netscape and Microsoft at the same table. Two standards, one family of semi-standards, one standard manqué and two proto-standards are involved, and sorting out their inter-relations, both technical and political, is a tricky business. The outcome of this struggle will determine the nature of online publishing for some time to come. In this, the first of a series of articles on this complex topic, we introduce the acronyms, the players and the issues.

The Acronyms

HTML
HyperText Markup Language
SGML
Standard Generalized Markup Language (ISO 8879)
XML
Extended Markup Language
CSS
Cascading Style Sheets
DSSSL
Document Style Semantics and Specification Language (ISO 10179)
ODA
Open Document Architecture (ISO 8613)

The Players

W3C
World Wide Web Consortium
ISO
International Standards Organisation
W3C SGML Working Group
A W3C working group responsible for XML.
W3C HTML Working Group
The W3C working group responsible for CSS.
WG8
ISO Working Group 8 [ISO/IEC JTC1/SC18/WG8]: Document Description and Processing Languages. The ISO committee directly responsible for SGML, DSSSL and related standards.
Microsoft and Netscape
Locked in (potentially mortal (for Netscape at least)) combat for the Web browser market
The Document Industry
Producers and consumers of prodigious volumes of structured text, desparately searching for a technology which will protect their data and their investments. Increasingly looking to intranets (private corporate networks), if not the Internet, for document distribution.

The Issues

Interoperability
The key issue for the software providers. A big ego thinks it can define its own standards, and everyone else will fall into line. Less plausible when more than one key player swings a lot of commercial weight. Recognition of this fact is all that W3C has to keep everyone paying W3C standards at least lip-service
Structure
Are documents flat, or (tree) structured? Are tags state-change signals to a formatter, or boundary markers in the tree structure? Is format separable from structure, and if so how?
Ownership
Who owns the data? The advantage of international standards to the users is that they know they're safe from the perils of single-sourcing (just ask anyone who has had to convert large amounts of corporate documentation from an obsolete word-processor format).

Conclusion