RÉSUMÉ: Les normes internationales pour les documents électronique fournissent un guide essentiel pour repenser l'ingénierie d'un service universel de documentation et d'information. Cette étude de cas donne un aperçu des divers facteurs dont on a tenu compte en utilisant le langage international Standard Generalized Markup Language (SGML) pour définir la structure et le contenu des manuels administratifs gouvernementaux (c.-à-d. pour préparer une définition du document-type, ou DDT). Cette étude de cas décrit les exigences fonctionnelles, les options de codage du SGML, l'expérience pertinente de l'industrie et les contraintes actuelles des logiciels de SGML qui ont exercé une influence sur la conception du DDT. Elle se termine par des remarque sur la réingénierie des services et les changements d'ordre culturel qui s'imposent pour actualiser l'efficacité potentielle liée à un milieu de travail électronique.
International approval of SGML,4 in 1986, provided a mechanism to re-engineer traditional document preparation and publishing practices. SGML offers a consistent, neutral means of specifying document structure (e.g. chapter, section, paragraph, table, etc.) and contents (e.g. author, title, copyright notice, etc.). It stipulates a standard format for tags and associated rules that can be applied by user groups to define significant structural and content components in documents. Collectively, the set of tags and their relationships constitute a document type definition (DTD). A DTD is like an interchange format. It provides a consistent, system-independent means of identifying and defining data in documents.
The SGML tags enable application programs to access document contents and to process it as required. For example, an electronic publication application could locate table titles within a given document by using the applicable SGML tags. Once isolated, these titles might be used to generate a consolidated list of tables for an electronic or print publication. In addition hyperlinks could also be inserted from the individual entries in the generated list of tables to the applicable table. Other software modules would use these SGML tags to control the font, style or size of displayed data, to manage user access to certain categories of data, and to generate contents for pop-up windows, etc.
By design, SGML treats the presentation or layout of documents as a separate and distinct process. The physical appearance of a document is determined by formatting attributes such as page dimensions, font style, and type size that are stated in a separate application rather than being embedded in the document contents. Removal of format attributes simplifies document publishing operations and enhances document reusability by other applications. A single source document can provide the raw data for multiple automated applications and services. Some examples include:
To implement SGML, an organization must define, adapt or adopt and maintain updates to DTD's for each class of documents that it intends to manage. All applications which use the source documents as raw data must be SGML-enabled or rely on conversion software to insert the proprietary application coding. Since most publications are intended for external users, the DTD definition and document production process should respect the users information processing needs-- basically request, receive, index, retrieve, annotate, display and archive electronic documents. The following paragraphs describe how the Treasury Board Secretariat, assisted by external SGML experts and federal departmental representatives, developed and validated an SGML document type definition for administrative manuals. This process lasted approximately two years and included document analysis, DTD definition, pilot implementation, user evaluation, and DTD revision.
To cope with the wide-ranging interpretations of document management needs, the project team relied on the consensus concerning this requirement6 achieved by the Office Systems Standards Working Group (OSSWG). They concluded that the document profile and associated data elements, in ISO's open document architecture,7 would support government-wide electronic document management. As envisaged by the OSSWG, the document profile would contain enough information to uniquely identify and describe each electronic document. In addition to supporting corporate record management needs, this data would support preliminary identification of candidate documents for archival retention. While accepting the OSSWG recommendations, the draft DTD included only that subset of the OSSWG recommended profile elements that demonstrated apparent utility to the project participants.
SGML identifies document structures and contents using tags (i.e. generic identifiers and attributes) that have been defined, as required, by various groups. Each group has used terms and abbreviations, readily understood by their communities, and thereby established precedents for naming document contents and structures. Since the generic identifiers and attributes refer to data that may be common to many types of documents (e.g. author, table, etc.) common names for such elements would facilitate user comprehension and document interchange. For the administrative manual DTD, the design team could have followed the naming conventions established by the publishing, military, automotive or pharmaceutical industries. Although none of these sector-specific DTD's were particularly applicable to administrative manuals, certain conventions that they established proved useful as noted below. Other DTD conventions established in the late 1980's, reflected the limited functionality of 1980's computer technology and were therefore avoided (e.g. SGML codes restricted to 8 characters and tag minimization used to overcome the lack of automated tagging provided by primitive authoring tools). The eventual choice was to use longer, more informative SGML names and to avoid tag minimization.
The SGML standard specifies a means of using the standard 7-bit ASCII code to represent any character code set. An initial decision to use the extended Latin character set, a valid SGML option, was revised, since in the near term, most commercial SGML parsers did not handle alternate character sets.
Although SGML can be used to define various styles and types of tables, the ability to change row and column dimensions or to define cell contents is quite restrictive. Since the overhead and complexities associated with table coding are substantial, the preferred option for table coding was to endorse the well established CALS8 conventions and strategies being promoted by the military sector. A contributing factor in this decision was that complex structures could be defined and commercial software was available to support table authoring and processing.
SGML accommodates non-SGML data such as graphics in various coding conventions. The range of de facto and de jure graphics standards presents a major DTD design challenge since it is impossible to select a single standard and to ensure that every potential user system will be able to process the encoded graphics. As graphics represented a relatively minor component in administrative manuals, such as the Treasury Board Manual, there was no great urgency to reach agreement on one or more encoding specifications and none were chosen for the draft DTD.
SGML imposes no restriction nor does it provide guidance for bilingual or multi-lingual DTD design. SGML coding can be optimized to support the creation, presentation or dissemination phases of bilingual publications. For example, close alignment of bilingual equivalent structures facilitates: a) translation and revision of the original document when this process is supported by SGML-based authoring tools designed for this purpose; b) efficient generation of bilingual print publications that are formatted as parallel columns, and; c) dissemination of bilingual electronic documents. Added overheads will be encountered in formatting lengthy documents for print publications and in splitting bilingual documents to provide users with either language version. Advice, based on experience and application requirements, contributed by Statistics Canada, the Canada Communications Group, the National Research Council and several private- sector companies failed to produce a single solution that satisfied everyone for every application. The chosen option enables a document to be coded as unilingual English or French language text or as alternating English and French text.
Legislation which justifies an administrative specification is frequently reproduced in administrative manuals and SGML offers various solutions to identify and manage the referenced text. The formal identifier and public text option enables the referenced text to reside on a remote system. Public text is defined by ISO as text that is available and accessible to systems other than the one on which the text resides. Such text must be uniquely identified, if it is to be accessible, using one of five ISO 9070 naming conventions.9 If the referenced text is substantial, it can also be coded in SGML as an included document or subdocument (with an associated separate DTD). The subdocument option is not practical today since commercial SGML parsers have not implemented this feature. The formal identifier is not readily implementable either since it requires the referenced documents to be available electronically, accessible remotely and registered formally. The remaining option was to identify the referenced document as an act and to postpone any more sophisticated intersystem linking and document tagging to a later date.
To support electronic review, SGML tags were defined to designate reviewers comments as: a) generic suggestions or replacement text applicable to the entire document or to specific components and; b) to record the author's or editor's reactions to each review proposal. Rather than define a government specific coding mechanism, the DTD adopted the CALS conventions which were subsequently adopted for the ISO version of the publishing industry DTD.10
It was determined that administrative manual development had no particular multi-authoring requirements. Even though some volumes have more than one author, it was decided that workflow management software could be implemented to track document segments assigned to individual authors and to integrate the respective segments into a consolidated publication. Alternatively, SGML codes could have been defined to monitor contributions by individuals to a consolidated document.
Since most administrative manuals are amended to some extent, SGML coding was provided to identify the author, date and number of each amendment. The resulting SGML coding can be used by some commercial document viewers to display a document as per a given date or amendment.
To interpret and apply central agency issued administrative publications, government agencies often need to qualify specific administrative directives. In addition, the annotating agency may wish to transfer these supplementary notes to subsequent editions of the administrative manual with minimal effort. The ability to annotate and manage annotations is typically provided by commercial SGML viewer software, thus no additional features were included in the administrative manual DTD.
SGML documents can support electronic information retrieval of specific
kinds of information by
providing explicit content codes to identify that information. For
example, warnings can be encoded to
permit user retrieval of this type of explanatory text. Any type of
textual data which must be readily
accessible by information retrieval applications can be identified
by a specific SGML tag. One example
supporting information retrieval is the tag for titles of
statutes (i.e.
The clean versions of the DTD, SGML encoded volume, electronic publication
and viewing software,
were distributed for evaluation to approximately 100 individuals in federal
government departments,
libraries, and private sector companies. An outside consultant was hired
to elicit user reaction and to
produce a consolidated assessment. This resulted in a number of
recommendations that can be summarized
as: a) SGML is rapidly being adopted; b) departments are not yet ready
for it but preparations are under
way, and; c) the Treasury Board Secretariat should continue to provide
leadership through SGML
implementation for all of its publications.11
The DTD developed by the Text Encoding Initiative
(TEI) 12 provided the definitive model for the
document management information. The TEI header includes descriptive
information and supports unique
identification of each document in a form that is amenable to library
and records management applications.
It can describe individual publications or assembled collections and
accommodates the archival appraisal
and disposition information recommended by the OSSWG and the ISBN
option for public identifiers
sanctioned by ISO 9070. This data may be interchanged in conjunction
with the electronic publications or
separately as an SGML document in its own. It could be used to
advertise new or revised publications, to
supplement Machine-Readable Cataloging services, and to facilitate
seamless access to remotely held information as envisaged by
the Blueprint.
Various software options are available to support SGML-based document
creation. Many of these
packages run on existing PC hardware platforms. The more sophisticated
packages are effectively hiding
the SGML syntax to make it virtually transparent to the document
authors and editors. Nevertheless the
authoring process will be subjected to a culture change as traditional
tasks such as document formatting are
made redundant and new disciplines are added to enhance overall document
quality and to support
hyperlinking within and across documents held on local and remote systems.
How individuals react to
these types of changes will depend on the individual and the care that
is taken to explain and promote the
restructuring which is associated with renewing service delivery through
innovative use of technology.
From a user's perspective, a variety of options are available to
accommodate every departmental
environment. If the local expertise and resources permit, the department
could choose to acquire the
SGML document and associated DTD for processing and formatting in
accordance with local system
indexing, display and distribution capabilities. Alternatively, the source
documents may be acquired in the
preprocessed form supported by commercial viewer software. The preferred
option will undoubtedly
evolve as the proportion of electronic documents grows and commercial
software becomes more sophisticated.
As publishers of electronic documents, government departments are being
presented with new
options for information dissemination. Possible media include online
interactive access and document
transfer using system-to-system connectivity or CD-ROMs and diskettes.
Imminent use of these media may
be closely linked to successful resolution of the copyright issue and
technical assurance of document
integrity. These issues were also addressed in the DTD pilot. For
example, the Treasury Board Secretariat revised the Treasury
Board Manual
copyright statement to allow unrestricted copying and duplication by
government employees. Encryption
software, developed by the National Research Council, was integrated
with an SGML DTD to demonstrate
controlled access to electronic documents. This software allows free
access to descriptive text but requires
and facilitates payments to document owners if a user wants to access
the substantive information.
The experience gained in the administrative manual project clearly
demonstrates that a variety of
expertise is required to support document re-engineering and standardization.
SGML is more than a syntax
for encoding documents in a standard way--it is the basis for
re-engineering government publications and
information delivery for all types of publications and information
services. To support this view, a follow-
up investigation was undertaken to define requirements for structured
document registry and repository
facilities 13 and thereby to define the required
tools and infrastructure to support an electronic workplace.
Pilot Implementation
To support validation of the draft DTD and proposed administrative manual
structure, the Communications and Coordination Directorate in
cooperation with the author undertook minor edits to restructure the
Insurance and Related Benefits volume
of the Treasury Board Manual. The revised text, encoded
in a proprietary word processing format, and the draft DTD were
submitted to an SGML service bureau to convert, tag and verify that the
revised volume complied with the
draft DTD. Separate document display and layout specifications were
prepared by the Communications and Coordination Directorate for use in
conjunction with the draft DTD and SGML-document instance to automatically
produce a bilingual print
publication and two unilingual electronic versions.DTD Refinement
The pilot implementation allowed a number of draft DTD provisions to be
examined and refined.
Most significant among these included extension of the document management
component to incorporate a
wider range of management data (e.g. document security classification)
and to support various related applications.Preparing for the Electronic Workplace
Having created and validated the DTD for administrative manuals, the
Treasury Board Secretariat and the participating
departments are in a stronger position to formulate their strategies
for working electronically. These
strategies must include staff training as well as systems facilities to
create, receive, disseminate and manage SGML-encoded documents.
Letters to the Editor / Lettres au rédacteur en
chef