Government Document Re-engineering and Standardization: A Case Study1

Edwin Buchinski, Treasury Board of Canada Secretariat2

ABSTRACT: International standards for electronic documents provide an essential guide for universal document and information service re-engineering. This case study summarizes the various factors that were taken into consideration in using the international Standard Generalized Markup Language (SGML), to define the structure and content of government administrative manuals (i.e. to prepare a document type definition--DTD). It describes the functional requirements, applicable SGML coding options, relevant industry experience and current commercial SGML software constraints that influenced the DTD design. It concludes with some observations on the service re-engineering and cultural change that will be required to realize the potential efficiencies associated with an electronic workplace.

RÉSUMÉ: Les normes internationales pour les documents électronique fournissent un guide essentiel pour repenser l'ingénierie d'un service universel de documentation et d'information. Cette étude de cas donne un aperçu des divers facteurs dont on a tenu compte en utilisant le langage international Standard Generalized Markup Language (SGML) pour définir la structure et le contenu des manuels administratifs gouvernementaux (c.-à-d. pour préparer une définition du document-type, ou DDT). Cette étude de cas décrit les exigences fonctionnelles, les options de codage du SGML, l'expérience pertinente de l'industrie et les contraintes actuelles des logiciels de SGML qui ont exercé une influence sur la conception du DDT. Elle se termine par des remarque sur la réingénierie des services et les changements d'ordre culturel qui s'imposent pour actualiser l'efficacité potentielle liée à un milieu de travail électronique.

Document Standardization: Vision, Potential and Some Bare Essentials

The Blueprint for Renewing Government Service Delivery Using Information Technology,3 issued by the Treasury Board Secretariat, projects a vision of an information society. In this vision, users fulfill their educational, recreational and professional information needs through integrated innovative technologies that are linked by an information highway. An underlying principle in this scenario is that standards will provide the underpinnings for the requisite re-engineering of information services. Drawing on actual experience, this case study illustrates that information re-engineering is taking place, within the government, and offers insights which could be used for re-engineering other documents to achieve the Blueprint envisaged services.

International approval of SGML,4 in 1986, provided a mechanism to re-engineer traditional document preparation and publishing practices. SGML offers a consistent, neutral means of specifying document structure (e.g. chapter, section, paragraph, table, etc.) and contents (e.g. author, title, copyright notice, etc.). It stipulates a standard format for tags and associated rules that can be applied by user groups to define significant structural and content components in documents. Collectively, the set of tags and their relationships constitute a document type definition (DTD). A DTD is like an interchange format. It provides a consistent, system-independent means of identifying and defining data in documents.

The SGML tags enable application programs to access document contents and to process it as required. For example, an electronic publication application could locate table titles within a given document by using the applicable SGML tags. Once isolated, these titles might be used to generate a consolidated list of tables for an electronic or print publication. In addition hyperlinks could also be inserted from the individual entries in the generated list of tables to the applicable table. Other software modules would use these SGML tags to control the font, style or size of displayed data, to manage user access to certain categories of data, and to generate contents for pop-up windows, etc.

By design, SGML treats the presentation or layout of documents as a separate and distinct process. The physical appearance of a document is determined by formatting attributes such as page dimensions, font style, and type size that are stated in a separate application rather than being embedded in the document contents. Removal of format attributes simplifies document publishing operations and enhances document reusability by other applications. A single source document can provide the raw data for multiple automated applications and services. Some examples include:

In addition, workflow applications could be adapted to enhance document life cycle management beginning with manuscript preparation and proceeding through peer review, editorial revision, publication, dissemination and archiving.

To implement SGML, an organization must define, adapt or adopt and maintain updates to DTD's for each class of documents that it intends to manage. All applications which use the source documents as raw data must be SGML-enabled or rely on conversion software to insert the proprietary application coding. Since most publications are intended for external users, the DTD definition and document production process should respect the users information processing needs-- basically request, receive, index, retrieve, annotate, display and archive electronic documents. The following paragraphs describe how the Treasury Board Secretariat, assisted by external SGML experts and federal departmental representatives, developed and validated an SGML document type definition for administrative manuals. This process lasted approximately two years and included document analysis, DTD definition, pilot implementation, user evaluation, and DTD revision.

Document Analysis

To ensure that the DTD would support existing and prospective service needs, the Treasury Board Manual was selected as a representative example of an administrative manual and analyzed to identify the inherent structural and content components for this type of document. The analysis uncovered various inconsistencies among the thirty-odd volumes of the Treasury Board Manual. Since these inconsistencies impeded DTD definition, the Communications and Coordination Directorate (CCD) consulted with authors to resolve these variants and to devise a common structure and style which was integrated into the draft DTD. A major challenge during the analysis phase was to respond to the functional issues identified by the project participants. As noted in the project report,5 these requirements included:

Administrative Manual DTD: Draft Definition

As noted below, each functional requirement was addressed separately, taking into consideration: the inherent ability of SGML to support each function; the availability of commercial SGML-compliant software; and, user community implementation experience.

To cope with the wide-ranging interpretations of document management needs, the project team relied on the consensus concerning this requirement6 achieved by the Office Systems Standards Working Group (OSSWG). They concluded that the document profile and associated data elements, in ISO's open document architecture,7 would support government-wide electronic document management. As envisaged by the OSSWG, the document profile would contain enough information to uniquely identify and describe each electronic document. In addition to supporting corporate record management needs, this data would support preliminary identification of candidate documents for archival retention. While accepting the OSSWG recommendations, the draft DTD included only that subset of the OSSWG recommended profile elements that demonstrated apparent utility to the project participants.

SGML identifies document structures and contents using tags (i.e. generic identifiers and attributes) that have been defined, as required, by various groups. Each group has used terms and abbreviations, readily understood by their communities, and thereby established precedents for naming document contents and structures. Since the generic identifiers and attributes refer to data that may be common to many types of documents (e.g. author, table, etc.) common names for such elements would facilitate user comprehension and document interchange. For the administrative manual DTD, the design team could have followed the naming conventions established by the publishing, military, automotive or pharmaceutical industries. Although none of these sector-specific DTD's were particularly applicable to administrative manuals, certain conventions that they established proved useful as noted below. Other DTD conventions established in the late 1980's, reflected the limited functionality of 1980's computer technology and were therefore avoided (e.g. SGML codes restricted to 8 characters and tag minimization used to overcome the lack of automated tagging provided by primitive authoring tools). The eventual choice was to use longer, more informative SGML names and to avoid tag minimization.

The SGML standard specifies a means of using the standard 7-bit ASCII code to represent any character code set. An initial decision to use the extended Latin character set, a valid SGML option, was revised, since in the near term, most commercial SGML parsers did not handle alternate character sets.

Although SGML can be used to define various styles and types of tables, the ability to change row and column dimensions or to define cell contents is quite restrictive. Since the overhead and complexities associated with table coding are substantial, the preferred option for table coding was to endorse the well established CALS8 conventions and strategies being promoted by the military sector. A contributing factor in this decision was that complex structures could be defined and commercial software was available to support table authoring and processing.

SGML accommodates non-SGML data such as graphics in various coding conventions. The range of de facto and de jure graphics standards presents a major DTD design challenge since it is impossible to select a single standard and to ensure that every potential user system will be able to process the encoded graphics. As graphics represented a relatively minor component in administrative manuals, such as the Treasury Board Manual, there was no great urgency to reach agreement on one or more encoding specifications and none were chosen for the draft DTD.

SGML imposes no restriction nor does it provide guidance for bilingual or multi-lingual DTD design. SGML coding can be optimized to support the creation, presentation or dissemination phases of bilingual publications. For example, close alignment of bilingual equivalent structures facilitates: a) translation and revision of the original document when this process is supported by SGML-based authoring tools designed for this purpose; b) efficient generation of bilingual print publications that are formatted as parallel columns, and; c) dissemination of bilingual electronic documents. Added overheads will be encountered in formatting lengthy documents for print publications and in splitting bilingual documents to provide users with either language version. Advice, based on experience and application requirements, contributed by Statistics Canada, the Canada Communications Group, the National Research Council and several private- sector companies failed to produce a single solution that satisfied everyone for every application. The chosen option enables a document to be coded as unilingual English or French language text or as alternating English and French text.

Legislation which justifies an administrative specification is frequently reproduced in administrative manuals and SGML offers various solutions to identify and manage the referenced text. The formal identifier and public text option enables the referenced text to reside on a remote system. Public text is defined by ISO as text that is available and accessible to systems other than the one on which the text resides. Such text must be uniquely identified, if it is to be accessible, using one of five ISO 9070 naming conventions.9 If the referenced text is substantial, it can also be coded in SGML as an included document or subdocument (with an associated separate DTD). The subdocument option is not practical today since commercial SGML parsers have not implemented this feature. The formal identifier is not readily implementable either since it requires the referenced documents to be available electronically, accessible remotely and registered formally. The remaining option was to identify the referenced document as an act and to postpone any more sophisticated intersystem linking and document tagging to a later date.

To support electronic review, SGML tags were defined to designate reviewers comments as: a) generic suggestions or replacement text applicable to the entire document or to specific components and; b) to record the author's or editor's reactions to each review proposal. Rather than define a government specific coding mechanism, the DTD adopted the CALS conventions which were subsequently adopted for the ISO version of the publishing industry DTD.10

It was determined that administrative manual development had no particular multi-authoring requirements. Even though some volumes have more than one author, it was decided that workflow management software could be implemented to track document segments assigned to individual authors and to integrate the respective segments into a consolidated publication. Alternatively, SGML codes could have been defined to monitor contributions by individuals to a consolidated document.

Since most administrative manuals are amended to some extent, SGML coding was provided to identify the author, date and number of each amendment. The resulting SGML coding can be used by some commercial document viewers to display a document as per a given date or amendment.

To interpret and apply central agency issued administrative publications, government agencies often need to qualify specific administrative directives. In addition, the annotating agency may wish to transfer these supplementary notes to subsequent editions of the administrative manual with minimal effort. The ability to annotate and manage annotations is typically provided by commercial SGML viewer software, thus no additional features were included in the administrative manual DTD.

SGML documents can support electronic information retrieval of specific kinds of information by providing explicit content codes to identify that information. For example, warnings can be encoded to permit user retrieval of this type of explanatory text. Any type of textual data which must be readily accessible by information retrieval applications can be identified by a specific SGML tag. One example supporting information retrieval is the tag for titles of statutes (i.e. ).

Pilot Implementation

To support validation of the draft DTD and proposed administrative manual structure, the Communications and Coordination Directorate in cooperation with the author undertook minor edits to restructure the Insurance and Related Benefits volume of the Treasury Board Manual. The revised text, encoded in a proprietary word processing format, and the draft DTD were submitted to an SGML service bureau to convert, tag and verify that the revised volume complied with the draft DTD. Separate document display and layout specifications were prepared by the Communications and Coordination Directorate for use in conjunction with the draft DTD and SGML-document instance to automatically produce a bilingual print publication and two unilingual electronic versions.

The clean versions of the DTD, SGML encoded volume, electronic publication and viewing software, were distributed for evaluation to approximately 100 individuals in federal government departments, libraries, and private sector companies. An outside consultant was hired to elicit user reaction and to produce a consolidated assessment. This resulted in a number of recommendations that can be summarized as: a) SGML is rapidly being adopted; b) departments are not yet ready for it but preparations are under way, and; c) the Treasury Board Secretariat should continue to provide leadership through SGML implementation for all of its publications.11

DTD Refinement

The pilot implementation allowed a number of draft DTD provisions to be examined and refined. Most significant among these included extension of the document management component to incorporate a wider range of management data (e.g. document security classification) and to support various related applications.

The DTD developed by the Text Encoding Initiative (TEI) 12 provided the definitive model for the document management information. The TEI header includes descriptive information and supports unique identification of each document in a form that is amenable to library and records management applications. It can describe individual publications or assembled collections and accommodates the archival appraisal and disposition information recommended by the OSSWG and the ISBN option for public identifiers sanctioned by ISO 9070. This data may be interchanged in conjunction with the electronic publications or separately as an SGML document in its own. It could be used to advertise new or revised publications, to supplement Machine-Readable Cataloging services, and to facilitate seamless access to remotely held information as envisaged by the Blueprint.

Preparing for the Electronic Workplace

Having created and validated the DTD for administrative manuals, the Treasury Board Secretariat and the participating departments are in a stronger position to formulate their strategies for working electronically. These strategies must include staff training as well as systems facilities to create, receive, disseminate and manage SGML-encoded documents.

Various software options are available to support SGML-based document creation. Many of these packages run on existing PC hardware platforms. The more sophisticated packages are effectively hiding the SGML syntax to make it virtually transparent to the document authors and editors. Nevertheless the authoring process will be subjected to a culture change as traditional tasks such as document formatting are made redundant and new disciplines are added to enhance overall document quality and to support hyperlinking within and across documents held on local and remote systems. How individuals react to these types of changes will depend on the individual and the care that is taken to explain and promote the restructuring which is associated with renewing service delivery through innovative use of technology.

From a user's perspective, a variety of options are available to accommodate every departmental environment. If the local expertise and resources permit, the department could choose to acquire the SGML document and associated DTD for processing and formatting in accordance with local system indexing, display and distribution capabilities. Alternatively, the source documents may be acquired in the preprocessed form supported by commercial viewer software. The preferred option will undoubtedly evolve as the proportion of electronic documents grows and commercial software becomes more sophisticated.

As publishers of electronic documents, government departments are being presented with new options for information dissemination. Possible media include online interactive access and document transfer using system-to-system connectivity or CD-ROMs and diskettes. Imminent use of these media may be closely linked to successful resolution of the copyright issue and technical assurance of document integrity. These issues were also addressed in the DTD pilot. For example, the Treasury Board Secretariat revised the Treasury Board Manual copyright statement to allow unrestricted copying and duplication by government employees. Encryption software, developed by the National Research Council, was integrated with an SGML DTD to demonstrate controlled access to electronic documents. This software allows free access to descriptive text but requires and facilitates payments to document owners if a user wants to access the substantive information.

The experience gained in the administrative manual project clearly demonstrates that a variety of expertise is required to support document re-engineering and standardization. SGML is more than a syntax for encoding documents in a standard way--it is the basis for re-engineering government publications and information delivery for all types of publications and information services. To support this view, a follow- up investigation was undertaken to define requirements for structured document registry and repository facilities 13 and thereby to define the required tools and infrastructure to support an electronic workplace.


Letters to the Editor / Lettres au rédacteur en chef