![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
The Roots of Markup Languages and XML
In the late 1960’s electronic phototypesetting displaced conventional hot-lead typesetting. Publishing systems became computerized. However, typesetting and formatting commands were embedded directly into the text files, which posed a problem for the publishers ― they could not transfer work from one typesetting service to another because most commands were proprietary and often constituted 100% plus overhead in their text files. Publishers began to demand that generic markup be applied to their files and that the markup be replaced with commands through computerized substitution upon output to an imagesetter.
However, even the use of generic mark languages proved to be difficult to interchange. The computer syntax, notation, and descriptors were all different. Delimiters could include the use of characters such as “<”, “[“, “*” and others. However, many systems used the same delimiters for different functions.
In the mid-1970’s the Graphic Computer Communications Association (later renamed Graphic Communications Association or GCA) for a committee to study generic markup languages and to create a standard markup language. The committee based their standard on the Generic Markup Language developed by Charles Goldfarb of IBM. Charlie was a lawyer rather than a programmer. In 1968 he was asked by IBM management to interview legal publishing firms to determine why they were unhappy with their legal publishing and information systems. Charlie determined that the systems’ functionality was good, but the basic problem was that each of IBM’s systems required a different text notation and markup.
The result of GCA’s work was Gencode®. Gencode® is not a set markup scheme or a set of predefined tags for publishers use, rather the committee settled on a descriptive computer language that users could employ to generate their own markup schema. In 1983 Gencode® became an ISO work item that was approved in 1985 and was published in 1986 as ISO 8879 under the name Standard Generlized Markup Language (SGML). Every SGML application has three components:
1. The declaration
2. The Document Type Definition (DTD)
3. The document instance
In SGML the declaration is used to define the syntax that will be used for the user’s markup application in a way that is machine-readable and that can be processed. The declaration sets the character set to be used, special data types that will be used such as graphic file formats, sets limitations on the name length of any element, sets the delimiters, and turns on or off a series of optional features that SGML standardized, such as the ability to omit end tags when a subsequent start tag required so by rule. Declarations, such as the one found in appendix A, can be rather complex.
The DTD is used to define document elements, attributes of elements, and entities, as well as their relationship to one another. A document instance is a text file in which the markup defined in a DTD is applied. A DTD can be used to parse an instance and verify its integrity. It can also be used within various types of systems to related commands and queries to the appropriate elements within a document instance. The most important aspect of SGML is that formatting and fonts were completely divorced from the structure of the content. Early DTD defined “headers,” or “abstracts,” or “paragraphs” as elements and attributes could include approval, or ownership, or versioning, but adding formatting instructions was generally forbidden.
One of the earliest adopters tended to be users whose information had a long life expectancy, such as airplane manufacturers or the Department of Defense ― organizations that had equipment whose supporting documentation was required to last decades. The motivation was application independence, not just independence from formatting instructions. These same companies were pioneering early optical media (what CD-ROMs were called before ISO 9660) and as a result, they made the first use of SGML documents for electronic publishing. In turn, the use of SGML in electronic media immediately lead to the development of extensions to SGML that could facilitate media other than text and graphics. In 1992, ISO/IEC 10744:1992. Information Technology - Hypermedia/Time-based Structuring Language (HyTime) was published. “HyTime provides standardized means of expressing (1) intra- and extra-document locations, and arbitrary links between them, (2) the scheduling of multimedia objects in 'finite coordinate spaces,' and (3) rendering instructions for arbitrarily projecting such objects onto other finite coordinate spaces, and other constructs." [i] This concept, developed in 1989-1991 by Steve Newcomb, a music professor at Tallahassee State University in Florida, led others to re-think SGML DTD designs.
Document constructs such as text and graphics are variable in length and can’t be easily managed with relational database technologies. Inevitably, SGML professionals began exploring object-oriented programming and database techniques. In quick succession in 1990 and 1990, Diane Kennedy of Datalogics and the ATA DTD committee decided to publish DTD fragments or “bricks” that users could build complete DTDs with, rather than complete DTDs that were not good fits for airplane documentation applications. Pushpa Merchant, and XML consultant, developed the brick concept into one that dealt with “frames” of information that were to be self-contained. Finally, Jim Harvey of Volt Information Sciences and the Society of Automotive Engineers, shed the document roots of SGML altogether and created the first truly object-oriented SGML application. Rather than defining the ultimate automotive service and diagnostics book structure, the group based their SGML on objects such as “Car,” “assembly,” “component,” and “diagnostics.”
“One interesting aspect of the J2008 recommended practice is that it encompasses only the information and the structure of information relative to itself. Although the Data Model is not specific to any data management technique, companies that provide support to OEMs, such as Volt and Datalogics, have begun to model J2008 data in an object-oriented environment that can facilitate these complex relationships.” [ii]
Around the same time another change was happening in the SGML community. Bell Atlantic Engineers, in 1987, introduced an online service that featured graphic representations of office documents, in color, exchanged over the Internet. They had two options: employ a simplified generic SGML DTD as their exchange format or use the editorial-based IMI format. They picked the wrong option ― perhaps one of the top five worst decisions ever made! Another product, designed for optical media publishing, called Guide from Owl, Ltd. introduced a simple four-tag SGML DTD that could be used to interpret any document into their retrieval program. Although neither of these first simplified SGML applications survived, some students at CERN were paying attention, wrote their own simplified tag set, the hyper text markup language, developed a browser and gave it away! It caught on and the worldwide web was born.
HTML caught on in
a big way, but soon users were dissatisfied with its limited capabilities and
its dumb, document-oriented tag selection. This dissatisfaction lead to the
development of the Document Object Model (DOM), which picked up
where the J2008 effort left off.
The Document Object Model
(DOM) is a platform- and language-neutral interface that permits script to
access and update the content, structure, and style of a document. The DOM
includes a model for how a standard set of objects representing HTML and XML
documents are combined, and an interface for accessing and manipulating them.
With the DOM, content authors can:
· Move one part of the document tree to another without destroying and re-creating the content.
· Create elements and attach them to any point in the document tree.
· Organize and manipulate new or existing tree branches in a document fragment before inserting the objects back into the tree. [iii]
In 1995 companies started using WWW technology to allow employees to share information and to allow customers to access their systems remotely. These types of systems are called Intranets and Extranets. However, In 1993 and 1994 advertising was introduced to the worldwide web and in 1995 and 1996 the first consumer commerce applications took hold. It didn’t take long for companies to start to add business-to-business commerce applications to their WWW sites. However, html was inherently too simple to deal with the complexity of commerce. In 1998, a simplified version of SGML was introduced: Extensible Markup Language (XML) 1.0, W3C Recommendation Feb. 10, 1998 www.w3.org/TR/1998/REC-xml-19980210. XML did away with the declaration by standardizing SGML notation (UNICODE), eliminating optional features such as Omittag, standardizing the well-known angle bracket delimiters, and so forth. XML also made DTD’s optional by requiring nesting of elements and employing the concept of “well-formedness:” requiring that software application be able to deduce the structure of the document from its tagging. XML also added explicitly object-oriented features such as super typing and sub typing, and schemas that employ object constructs.
[i] The XML Cover Pages, SGML/XML Bibliography Part 4, I – L, by Robin Cover, OASI, http://www.oasis-open.org/cover/bib-il.html#iso10744, August 03, 1999
[ii] SGML Applied to Automotive Service Information, by James E. Harvey, pg. 27-31, CALS Journal, Saratoga, NY, Fall 1993.
[iii] Document Object Model Overview, MSDN Online Workshop, Microsoft Corporation, http://msdn.microsoft.com/ workshop/author/dom/domoverview.asp, June 2000.
|
|