Data Management in Canada: Issues for Global Change Research1

Paper presented at the Workshop on Climate System Data Quebec, May 16-18 1994

Don C. McKay and David Henderson2

ABSTRACT: Global change investigation places new challenges on researchers, data managers and decision makers because of its multi- disciplinary perspective and its emphasis on data sharing and exchange. The Data and Information Systems Panel, sponsored by the Royal Society of Canada's Canadian Global Change Program, has responded to this challenge by investigating national and international data management policies to ascertain their suitability for global change research. Cost-recovery and copyright are but two of the issues tackled by the Panel in its forthcoming discussion paper. This article represents the conclusions of that analysis, which is scheduled to be released shortly.

RÉSUMÉ: À cause de leur perspective multi-disciplinaire et de l'accent qu'elles mettent sur le partage et les échanges de données, les études sur les changements globaux lancent de nouveaux défis aux chercheurs, aux gestionnaires des données et aux décideurs. Le Comité sur les données et l'information, parrainé par le Programme de changements à l'échelle du globe de la Société royale du Canada, a relevé ce défi en menant des recherches sur les politiques nationales et internationales de gestion de données pour s'assurer qu'elles convenaient aux recherches sur les changements à l'échelles du globe. La rentabilité et le droit d'auteur ne sont que deux des dossiers que le Panel aborde dans le document de discussion qui doit paraître prochainement. Cet article présente les conclusions de cette analyse, qui doit bientôt être publiée.

Introduction

     Good science needs good data.  This may be an obvious
     statement but it is fundamental to understanding the current
     debates in global environmental research regarding data
     management and the dynamics of changing requirements and
     expectations for data.  As the complexity and diversity of
     scientific inquiry increases, so there is an attendant
     requirement for increasing sophistication and rationalization
     of data collection and management.3

The requirements of global change research have made a major impact on the conduct of scientific research in Canada as elsewhere. No longer is it possible to satisfy the data requirements of scientific research by merely collecting new data. For the first time on a large scale, the requirements for comparative analysis, over space and over time, have made scientific research dependent on secondary analysis, a pattern which has predominated in the social sciences for over forty years. This basic change in the focus of scientific data collection has major implications for data management.

Both the United Kingdom4 and the United States5 have produced reports reviewing the legislative, policy, and pragmatic issues which influence the ability of these respective countries to participate in the global environmental research arena. Both reports have made a series of recommendations aimed at improving the preconditions for national and international global change research. It is with similar intent that the Data and Information Systems Panel (DISP) of the Canadian Global Change Programme undertook a review of the Canadian situation, which appears to be somewhere between those of the United States and the United Kingdom.

The product of this review will be a discussion paper which will attempt to clarify Canadian issues of relevance to data management and data access, in the context of Canadian and international global change research, and to place the Canadian issues in the international context. Whereas the context of the discussion paper is specifically global environmental change, the conditions and issues discussed are common to all areas of research, not just those which relate directly to global change per se. Thus the Panel hopes that the review may instigate changes which will benefit not just global change research in Canada and internationally, but will also improve the conditions for quantitative research in all disciplines.

This paper, a synopsis of the discussion paper, is presented to the participants of the Workshop on Data for the Climate System with the intent of laying the foundation for discussion related to data access, awareness and policy.

Characteristics of Canadian Data Management Strategies

Current Canadian data collection and management strategies can be characterized as follows: The Canadian "system" consists of scattered data management activities, without focus or even a common vehicle for communication, in the absence of any central data archives in any discipline. In the sciences, a few of the major federal government departments which are major collectors of data have made internal arrangements for data management, such as Environment Canada, Department of Fisheries and Oceans, and the Canadian Centre for Remote Sensing . In a few instances, ad hoc collections of scientific data are also managed in academic institutions. In the social sciences, some but by no means all, data collected by the academic sector are managed in data archives at some ten Canadian academic institutions; the data relevant to the social sciences collected by government are basically not managed at all. In a few instances, data collected in the course of humanities research has found its way into the collections of the National Archives of Canada, or into the managerial auspices of academic data archives and data libraries, but again there is no cohesive focus for data management in this discipline.

Thus the system in Canada caters to the needs of a few individual disciplines or narrow groupings of disciplines. Any Canadian participation in the establishment of international rules for data collection and management has largely been without reference to the broader audience of Canadian data management activities, and indeed, most existing Canadian data archives remain in ignorance of other similar activities in Canada. Nor is there widespread knowledge outside the data archives, among data collectors and producers, of the existence of those data archiving activities that are in place.

Some examples illustrate this contention.

The collection of the census of population is a major undertaking by Statistics Canada every five years. Statistics Canada neither has an archival program of its own, nor deposits its collections in the National Archives of Canada. The management of the data collected in each population census is thus a project driven activity, and once the project becomes inactive, the data are at risk. Some years ago, Statistics Canada discovered that the aggregate data products from the 1961 census had been lost; in the event, they were able to acquire a set of most of these products from the University of Alberta, which had initially acquired the files from Statistics Canada. Now, it has become known that the master records from the 1961 census of population have been lost by Statistics Canada; since these records were not deposited in the National Archives, nor, for reasons of confidentiality and privacy, acquired by any other institution in Canada, these records cannot be replaced. Granted, the original census questionnaires are available in the National Archives, but there would be enormous cost involved in the reconversion of these records to computer-readable form.

In the late 1960s, Gallop Canada Ltd., in an attempt to clear its storage of early polling data, offered a collection of its polls to the University of Toronto. The University, which did not at the time have a data archive, refused the collection, omitting in the process, to refer Gallop to one of the Canadian universities which did have a data archive. The polls, which spanned the years of World War II and which would have been of immense research value to historians, were subsequently irretrievably lost.

Changes in Requirements and Perceptions

The Report of the Inter-Agency Committee on Global Environmental Change6 outlined very succinctly the reasons that a reappraisal of data management strategies is required. These are reproduced verbatim here:

     i) the emergence of scientific issues which are
     interdisciplinary in nature and global in scale has led to a
     requirement on a national level for global data sets.  Three
     major international science programmes (WCRP, IGBP, HDGEC)
     have recognized this and established internal structures to
     investigate the availability of appropriate data and the
     necessary data handling regimes.  This has arisen primarily
     from the global climate change research agenda.  This raises
     major questions about the relationship between the providers
     and users of data.

     ii) there has already been a tremendous increase in the
     quantity of global data sets collected.  Space-borne remotely
     sensed data have been in the vanguard of this drive for
     increased data, but international programmes such as the
     Tropical Ocean Global Atmosphere (TOGA) have also led to
     considerable increases in ground-based data collection and the
     associated data management regimes.

     iii) the cost of handling such large quantities of data
     (including acquisition, storage, processing and distribution)
     is high, and, combined with generally increased cost awareness
     in all of the developed countries, has led to concerns about
     who pays for such data regimes and the consequent relationship
     with the data user.  It has also led to more multi-national
     cooperative ventures, such as, Committee on Earth Observation 
     Satellites (CEOS), International Geosphere-Biosphere Program - Data 
     and Information Systems (IGBP-DIS) and the proposed Global
     Climate Observing System (GCOS).

     iv) there is an increased conceptual recognition of the
     intrinsic value of data.  No longer are they simply a by-
     product of research or short term operational needs.  Data are
     re-usable.  They are therefore a commodity requiring the
     necessary infrastructure to facilitate their re-use.  It is
     now recognized that the intrinsic value of data arises from
     their re-use, and hence they need to be maintained in a manner
     conducive to their re-use and such re-use needs to be
     promoted, a point emphasized by a recent US report critical of
     NASA and NOAA in this respect.

     v) in the summer of 1991 CODATA and ICSTI conducted a
     joint study of barriers to access to scientific information.
     The following quote is illustrative of the access issues that
     are central to the question of data management for global
     change research, and hence central to this paper:

     "Of particular concern is the issue of fair and open access to
     information sources by all bona fide scientists.  Some
     scientists are fearful that monopolistic practices or
     governmental restrictions might impede their ability to gain
     access to the information and data which they need.  On the
     other hand, the scientific societies, publishers, and other
     information disseminators are concerned about unauthorized use
     of the intellectual property which they have assembled.  Above
     all, the economics of scientific information have been greatly
     perturbed, and the optimum policy for distributing the cost of
     these powerful new information services remains to be worked
     out."

     The above points are indicative of a new and still evolving
     climate for data management regimes.  The fundamental question
     is whether the traditional institutional policies and
     mechanisms are adequate to manage such change and take
     advantage of the new situation.  The ever increasing demand
     for global data sets for global change research necessitates a
     re-evaluation of those existing mechanisms.  The fundamental
     question is how to provide the necessary data sets, at the
     required temporal and spatial resolution, and how to fund the
     attendant cost of data acquisition, storage, processing and
     distribution.  The following questions are indicative of the
     questions which need to be addressed.

     Do central archives offer the optimum solution (both
     financially and scientifically) or is a distributed system of
     expert archives a more realistic model?

     Are there gaps or overlap in the current efforts of
     international data programmes?

     Where do the opportunities lie for cross-fertilisation of
     experience and knowledge?

     What are the optimum conditions for access to data, and can
     they be reconciled with the institutional mechanisms and
     policy prerogatives operating at a national and international
     level?

As stated earlier, it is an objective of this paper to examine these issues, and to make recommendations aimed at stimulating the fullest use of data resources for all research, especially global change research.

Issues

After reviewing the many and various national and international laws, policies and institutions relevant to data management in general and to global environmental research in particular, the actual implementation of these policies and the issues and problems (actual and potential) which arise therefrom were considered. Included were a number of other issues which are not directly the product of legislation or policy, but are driven by the pace of technological change, by the characteristics of computer-readable media, and by international activities.

The issues are covered under six broad headings: those relating to infrastructure, to intellectual property, to documentation, to access, to archiving, and to standards.

Infrastructure

There are a number of infrastructure problems.

The DISP's review of national policy made apparent the need for a mechanism by which Canadian data management strategy can be developed.

Canadian information policy should not be set in a vacuum. At present, policy development on data collection and management issues occurs with little or no consultation with those groups currently responsible for the collection and management of data. Thus there is a need to plan and implement a forum in which those who collect data and manage data and those who set policy which impinges on data management can interact.

The DISP suggests the following objectives of a national data management strategy:

  1. To broaden the involvement in global research beyond those immediately associated with the gathering of the data;
  2. To combine data sources in imaginative ways to explore research questions previously uninvestigated and in ways unforeseen by those initially gathering the data;
  3. To share a resource with the wider research community in general, and, consequently, to receive a greater return on the investment of the original gathering of the data; and
  4. To save funding agencies the cost of gathering data when secondary sources may prove worthy in certain research.
Such a forum, which can provide informed input to and review of national policy that impinges on data collection and data management issues is currently lacking. Those fora which currently exist are oriented along disciplinary lines, or are project oriented, and lack the interdisciplinarity essential to the formulation of national policy. In addition, the current fora lack the stature to effect change in national legislation and policy. Whereas in the United Kingdom there is the Inter-Agency Committee on Global Environmental Change (IACGEC) which can oversee implementation of recommendations, and in the United States there is NASA, in Canada there is no clear body responsible for the implementation of changes at the national policy levels. Neither the Royal Society of Canada's Canadian Global Change Program, nor the Canadian Climate Board, nor the Inter-Departmental Committee on Global Change (ICGC) have a mandate to fulfill this role.

In addition to the need for a forum to provide input to national policy, is the need for a written data policy. Currently, the United States has a data management for global change policy statement, but no other country does. It is not clear, at present, which body in Canada should be tasked with writing such a statement, but it would be entirely appropriate for this task to be within the mandate of a national forum dealing with policy issues.

As important as the need for written policy and institutional infrastructure to facilitate the development and review of national data policy, is the need for the physical infrastructure to facilitate data access and dissemination. Current telecommunications infrastructure is inadequate to support optimum exploitation of collaborative data collection, data management, and data analysis projects. A strong infrastructure that supports collaborative projects at all levels of the research continuum is essential in a country as vast as Canada where communication is constantly hampered by inordinate distance. Currently the meager resources allocated to data management in Canada are wasted on duplication of services and facilities in an attempt to compensate for the inadequacies of the communications network.

Intellectual Property

Central to the issues discussed in this paper is the issue of the ownership of intellectual property. This issue is important because the Copyright Act, and international agreements on copyright, give to the owner of the intellectual property the right to determine whether or not, and how, the property is to be 'published', i.e. the conditions of access and/or dissemination. This issue then has implications for issues of access and dissemination to be discussed later. It also has implications for preservation issues, since, especially when intellectual property is not "published" the only entity in a legal position to ensure preservation is the owner of the copyright. The Copyright Act, as we have determined, defines three potential owners of intellectual property: the author (or "principal investigator"), the employer, or the Crown.

If one accepts the contention that the bulk of data of relevance of global change research is collected by the federal government, the bulk of data is protected under Crown copyright.

Thus it is current government policies vis-a-vis exercising that right of ownership, and policies relating to access and dissemination which are of paramount importance.

Current Treasury Board policy statements make it clear that the government's only interest in intellectual property is focused on its potential for commercial exploitation. Federal government departments are enjoined to take a 'business-case' approach to the planning of outputs. As well, intellectual property developed with federal government contract funding is the property of the contractee, in the expectation that it will be commercially exploited. Under such policy, there is a great risk that:

Adding to the confusion is the situation in the United States in which all intellectual property developed by the federal government is de facto in the public domain. The assumption appears widely held that this is the case in Canada as well.

Even in academia, the question of ownership of intellectual property is not systematic. Research funding agencies make no claim to the intellectual property developed with their research funds. Yet academic institutions, to which most such intellectual property would belong, make no effort to claim it, unless there are prospects of commercial exploitation.

Thus the question of ownership of intellectual property in Canada is rather confusing.

The issue of recognition of ownership of intellectual property is not confined to Canada, but is universal. The problem is centered in the lack of recognized standards for acknowledging the contributions made by the work of others in research, publication etc. Granted, ISO is attempting to draft citation standards for electronic publications, but the standards promulgated thus far are entirely inadequate to the appropriate identification and acknowledgment of research data products. Suggested standards have been promulgated by other individuals and groups, but these have neither the stature nor universality of ISO standards. Until there are universally recognized standards for citation of electronic files, whatever their content, and until these standards are universally adopted by the publishing industry, nationally as well as internationally, this will remain an enduring issue.

Documentation

Much data collection happens at a project level. Because of the inter-disciplinary requirements of global change research, the cross-disciplinary comparability of data becomes essential. The project-oriented nature of data collection in Canada militates against comparability of data between or among projects. There are currently no standards of documentation, access and preservation, to which project managers are required to adhere. Nor is there a body which has the responsibility of setting such standards in Canada. Such standards as there are suffer from intra-disciplinarity, rather than having inter-disciplinary focus, and tend to be of a de facto nature.

In addition to adherence to national standards, there is also an urgent need for cooperative projects such as national inventories of data, to enable researchers to identify available data sources, to foster interdisciplinary research, and to identify lacunae in the available data sources as a focus for future data collection activities.

Further, there needs to be recognition of data management as a professional activity. Currently, there are no rewards built into either academic or other structures which recognize data management as an activity. An almost total absence of requirements to even acknowledge in a uniform way the use of data files as source materials in research result in widespread neglect to do so. This exacerbates the problem that reward structures for professional activity are almost exclusively centred on the production of publications...yet data files are not regarded as publications.

Access

The major barrier to access to data is cost. The increases in cost to researchers of data acquisition in the past decade has been in direct response to Treasury Board's explicit directions to federal agencies which require (a) the justification of all data collection activities on a business-case basis, and (b) that agencies charge users for all aspects of data production, including collection and management, and (c) a basic philosophy that regards all secondary data usage as pandering to the proprietary interests of individuals without regard to the needs of the research infrastructure in Canada. Further exacerbating this is the lack of uniformity in application of Treasury Board policies across federal government agencies.

Ideally, access to all data should be free and available to all. Two major schools of thought, both of which recognize the 'public good' aspect of research activity, would argue for optimal data access under certain conditions. One school of thought, such as the United Kingdom Granting Council, would argue that bona-fide researchers should have privileged access to data; the other school of thought would argue that it should be the objective of the access, i.e. usage for research which will be in the public domain that should have privileged access. Both points of view have advantages and disadvantages. In the international arena, Canada will not be able to determine unilaterally which of these competing philosophies to support. However, it is imperative that Canada participate in the processes that eventually determine which model will become the international de facto basis for data access and dissemination.

In order to answer the questions posed by global change, Canada will need access to global data bases, and to ensure it, it is in our national interest to ensure that there is a strong data collection, data management and data preservation infrastructure in place to facilitate Canadian needs. This will require a commitment on the part of governments and organizations participating in global change programs to toward allocate scarce resources towards these ends.

There also must be recognition that Canadian needs for data collected by other countries will be matched by the needs of researchers in other countries for our data. In order to ensure access by the our research sector to foreign data, it will be vital that Canada facilitate access to Canadian data not only to its own research communities but also to extra-national and international research activities.

Archiving

There is a total lack of focus for archival standards and processes in Canada. Despite the provisions of the National Archives of Canada Act, little if any activity occurs at the level of the National Archives in the area of archiving electronic data. The reasons for this are clear: the National Archives has neither the funds nor the human resources to fulfill its role in the archiving of data produced by the federal government. Nor does it have the divisional infrastructure to provide a focus for archiving activities at other levels in Canada. Some government departments have provisional archival activity, but this is generally uncoordinated, and suffers from a lack of standards. Provincial archives by and large do not collect computer-readable data. Archiving in the academic sector occurs in scattered university-based data archives and data libraries, which also suffer from lack of resources to perform their functions adequately. There is no known cohesive archival activity in the commercial sector.

Standards

In the global arena, Canada will be one of the major countries affected by global change, by virtue of its size and location. In order to prepare for this global change we will be required not only to monitor and collect our own data but will require international data as well. Therefore it is of paramount importance that Canada actively participate in international data collections and standards setting initiatives and activities.

As well, because of Canada's strategic location in regards to global change, it will be host to a number of research programs related to global change...another reason for a high level of international participation.

Based on the review undertaken by the DISP the following set of recommendations were developed for discussion:

    Infrastructure

  1. There needs to be a national data management strategy, which must include a mechanism to bring together the key players; provide for ongoing review and oversight, and to ensure adequate resources, both fiscal and human, for the collection, preservation and accessibility of Canadian data of relevance to global change research.
  2. That Treasury Board undertake a review of federal information policy in collaboration with the Canadian research community, both public and private, with respect to access, dissemination and preservation of data. That such a review specifically consider current legislation and policies that restrict access to information and data collected by government agencies, with a view to reaffirming the principle of open access at affordable cost.
  3. That data management and preservation activities be recognized as an integral part of the research peer evaluation process and the Canadian research community.

    Archiving

  4. That National Archives of Canada review its mandate and commitment vis-a-vis the management and preservation of machine-readable data, and re-establish the Machine Readable Archives Division as a focus for data management and archiving standards and procedures at all levels in Canada.
  5. That the National Archives of Canada take the lead in preparing a long-term strategy for the coordinated archiving of data in academic data archives, data libraries, and federal government data centres.
  6. That all agencies, whether government, academic or private sector, that fund data collection activities, require as a key item of data collection proposals a statement of intent vis-a- vis the management, archiving, and disposition of the data, ensuring that appropriate agencies have responsibility for stewardship of the data.

    Documentation

  7. That the national data management strategy include procedures and standards for quality data documentation at two levels: mechanisms for identifying relevant data, and the optimum documentation of individual data files.
  8. That the national data management strategy promote Canadian participation in relevant international data activities, including the development and promotion of standards for data interchange and data documentation.

    Access

  9. That the national information policy support the principle of reciprocal international data exchange agreements, to ensure Canadian access to international data as well as international access to Canadian data for global change research.
  10. The national data management strategy must endorse the use of appropriate international standards to the greatest possible extent, to facilitate the exchange of and access to data.

Letters to the Editor / Lettres au rédacteur en chef