RÉSUMÉ: À cause de leur perspective multi-disciplinaire et de l'accent qu'elles mettent sur le partage et les échanges de données, les études sur les changements globaux lancent de nouveaux défis aux chercheurs, aux gestionnaires des données et aux décideurs. Le Comité sur les données et l'information, parrainé par le Programme de changements à l'échelle du globe de la Société royale du Canada, a relevé ce défi en menant des recherches sur les politiques nationales et internationales de gestion de données pour s'assurer qu'elles convenaient aux recherches sur les changements à l'échelles du globe. La rentabilité et le droit d'auteur ne sont que deux des dossiers que le Panel aborde dans le document de discussion qui doit paraître prochainement. Cet article présente les conclusions de cette analyse, qui doit bientôt être publiée.
Good science needs good data. This may be an obvious statement but it is fundamental to understanding the current debates in global environmental research regarding data management and the dynamics of changing requirements and expectations for data. As the complexity and diversity of scientific inquiry increases, so there is an attendant requirement for increasing sophistication and rationalization of data collection and management.3
The requirements of global change research have made a major impact on the conduct of scientific research in Canada as elsewhere. No longer is it possible to satisfy the data requirements of scientific research by merely collecting new data. For the first time on a large scale, the requirements for comparative analysis, over space and over time, have made scientific research dependent on secondary analysis, a pattern which has predominated in the social sciences for over forty years. This basic change in the focus of scientific data collection has major implications for data management.
Both the United Kingdom4 and the United States5 have produced reports reviewing the legislative, policy, and pragmatic issues which influence the ability of these respective countries to participate in the global environmental research arena. Both reports have made a series of recommendations aimed at improving the preconditions for national and international global change research. It is with similar intent that the Data and Information Systems Panel (DISP) of the Canadian Global Change Programme undertook a review of the Canadian situation, which appears to be somewhere between those of the United States and the United Kingdom.
The product of this review will be a discussion paper which will attempt to clarify Canadian issues of relevance to data management and data access, in the context of Canadian and international global change research, and to place the Canadian issues in the international context. Whereas the context of the discussion paper is specifically global environmental change, the conditions and issues discussed are common to all areas of research, not just those which relate directly to global change per se. Thus the Panel hopes that the review may instigate changes which will benefit not just global change research in Canada and internationally, but will also improve the conditions for quantitative research in all disciplines.
This paper, a synopsis of the discussion paper, is presented to the participants of the Workshop on Data for the Climate System with the intent of laying the foundation for discussion related to data access, awareness and policy.
Thus the system in Canada caters to the needs of a few individual disciplines or narrow groupings of disciplines. Any Canadian participation in the establishment of international rules for data collection and management has largely been without reference to the broader audience of Canadian data management activities, and indeed, most existing Canadian data archives remain in ignorance of other similar activities in Canada. Nor is there widespread knowledge outside the data archives, among data collectors and producers, of the existence of those data archiving activities that are in place.
Some examples illustrate this contention.
The collection of the census of population is a major undertaking by Statistics Canada every five years. Statistics Canada neither has an archival program of its own, nor deposits its collections in the National Archives of Canada. The management of the data collected in each population census is thus a project driven activity, and once the project becomes inactive, the data are at risk. Some years ago, Statistics Canada discovered that the aggregate data products from the 1961 census had been lost; in the event, they were able to acquire a set of most of these products from the University of Alberta, which had initially acquired the files from Statistics Canada. Now, it has become known that the master records from the 1961 census of population have been lost by Statistics Canada; since these records were not deposited in the National Archives, nor, for reasons of confidentiality and privacy, acquired by any other institution in Canada, these records cannot be replaced. Granted, the original census questionnaires are available in the National Archives, but there would be enormous cost involved in the reconversion of these records to computer-readable form.
In the late 1960s, Gallop Canada Ltd., in an attempt to clear its storage of early polling data, offered a collection of its polls to the University of Toronto. The University, which did not at the time have a data archive, refused the collection, omitting in the process, to refer Gallop to one of the Canadian universities which did have a data archive. The polls, which spanned the years of World War II and which would have been of immense research value to historians, were subsequently irretrievably lost.
i) the emergence of scientific issues which are interdisciplinary in nature and global in scale has led to a requirement on a national level for global data sets. Three major international science programmes (WCRP, IGBP, HDGEC) have recognized this and established internal structures to investigate the availability of appropriate data and the necessary data handling regimes. This has arisen primarily from the global climate change research agenda. This raises major questions about the relationship between the providers and users of data. ii) there has already been a tremendous increase in the quantity of global data sets collected. Space-borne remotely sensed data have been in the vanguard of this drive for increased data, but international programmes such as the Tropical Ocean Global Atmosphere (TOGA) have also led to considerable increases in ground-based data collection and the associated data management regimes. iii) the cost of handling such large quantities of data (including acquisition, storage, processing and distribution) is high, and, combined with generally increased cost awareness in all of the developed countries, has led to concerns about who pays for such data regimes and the consequent relationship with the data user. It has also led to more multi-national cooperative ventures, such as, Committee on Earth Observation Satellites (CEOS), International Geosphere-Biosphere Program - Data and Information Systems (IGBP-DIS) and the proposed Global Climate Observing System (GCOS). iv) there is an increased conceptual recognition of the intrinsic value of data. No longer are they simply a by- product of research or short term operational needs. Data are re-usable. They are therefore a commodity requiring the necessary infrastructure to facilitate their re-use. It is now recognized that the intrinsic value of data arises from their re-use, and hence they need to be maintained in a manner conducive to their re-use and such re-use needs to be promoted, a point emphasized by a recent US report critical of NASA and NOAA in this respect. v) in the summer of 1991 CODATA and ICSTI conducted a joint study of barriers to access to scientific information. The following quote is illustrative of the access issues that are central to the question of data management for global change research, and hence central to this paper: "Of particular concern is the issue of fair and open access to information sources by all bona fide scientists. Some scientists are fearful that monopolistic practices or governmental restrictions might impede their ability to gain access to the information and data which they need. On the other hand, the scientific societies, publishers, and other information disseminators are concerned about unauthorized use of the intellectual property which they have assembled. Above all, the economics of scientific information have been greatly perturbed, and the optimum policy for distributing the cost of these powerful new information services remains to be worked out." The above points are indicative of a new and still evolving climate for data management regimes. The fundamental question is whether the traditional institutional policies and mechanisms are adequate to manage such change and take advantage of the new situation. The ever increasing demand for global data sets for global change research necessitates a re-evaluation of those existing mechanisms. The fundamental question is how to provide the necessary data sets, at the required temporal and spatial resolution, and how to fund the attendant cost of data acquisition, storage, processing and distribution. The following questions are indicative of the questions which need to be addressed. Do central archives offer the optimum solution (both financially and scientifically) or is a distributed system of expert archives a more realistic model? Are there gaps or overlap in the current efforts of international data programmes? Where do the opportunities lie for cross-fertilisation of experience and knowledge? What are the optimum conditions for access to data, and can they be reconciled with the institutional mechanisms and policy prerogatives operating at a national and international level?
As stated earlier, it is an objective of this paper to examine these issues, and to make recommendations aimed at stimulating the fullest use of data resources for all research, especially global change research.
The issues are covered under six broad headings: those relating to infrastructure, to intellectual property, to documentation, to access, to archiving, and to standards.
The DISP's review of national policy made apparent the need for a mechanism by which Canadian data management strategy can be developed.
Canadian information policy should not be set in a vacuum. At present, policy development on data collection and management issues occurs with little or no consultation with those groups currently responsible for the collection and management of data. Thus there is a need to plan and implement a forum in which those who collect data and manage data and those who set policy which impinges on data management can interact.
The DISP suggests the following objectives of a national data management strategy:
In addition to the need for a forum to provide input to national policy, is the need for a written data policy. Currently, the United States has a data management for global change policy statement, but no other country does. It is not clear, at present, which body in Canada should be tasked with writing such a statement, but it would be entirely appropriate for this task to be within the mandate of a national forum dealing with policy issues.
As important as the need for written policy and institutional infrastructure to facilitate the development and review of national data policy, is the need for the physical infrastructure to facilitate data access and dissemination. Current telecommunications infrastructure is inadequate to support optimum exploitation of collaborative data collection, data management, and data analysis projects. A strong infrastructure that supports collaborative projects at all levels of the research continuum is essential in a country as vast as Canada where communication is constantly hampered by inordinate distance. Currently the meager resources allocated to data management in Canada are wasted on duplication of services and facilities in an attempt to compensate for the inadequacies of the communications network.
If one accepts the contention that the bulk of data of relevance of global change research is collected by the federal government, the bulk of data is protected under Crown copyright.
Thus it is current government policies vis-a-vis exercising that right of ownership, and policies relating to access and dissemination which are of paramount importance.
Current Treasury Board policy statements make it clear that the government's only interest in intellectual property is focused on its potential for commercial exploitation. Federal government departments are enjoined to take a 'business-case' approach to the planning of outputs. As well, intellectual property developed with federal government contract funding is the property of the contractee, in the expectation that it will be commercially exploited. Under such policy, there is a great risk that:
Even in academia, the question of ownership of intellectual property is not systematic. Research funding agencies make no claim to the intellectual property developed with their research funds. Yet academic institutions, to which most such intellectual property would belong, make no effort to claim it, unless there are prospects of commercial exploitation.
Thus the question of ownership of intellectual property in Canada is rather confusing.
The issue of recognition of ownership of intellectual property is not confined to Canada, but is universal. The problem is centered in the lack of recognized standards for acknowledging the contributions made by the work of others in research, publication etc. Granted, ISO is attempting to draft citation standards for electronic publications, but the standards promulgated thus far are entirely inadequate to the appropriate identification and acknowledgment of research data products. Suggested standards have been promulgated by other individuals and groups, but these have neither the stature nor universality of ISO standards. Until there are universally recognized standards for citation of electronic files, whatever their content, and until these standards are universally adopted by the publishing industry, nationally as well as internationally, this will remain an enduring issue.
In addition to adherence to national standards, there is also an urgent need for cooperative projects such as national inventories of data, to enable researchers to identify available data sources, to foster interdisciplinary research, and to identify lacunae in the available data sources as a focus for future data collection activities.
Further, there needs to be recognition of data management as a professional activity. Currently, there are no rewards built into either academic or other structures which recognize data management as an activity. An almost total absence of requirements to even acknowledge in a uniform way the use of data files as source materials in research result in widespread neglect to do so. This exacerbates the problem that reward structures for professional activity are almost exclusively centred on the production of publications...yet data files are not regarded as publications.
Ideally, access to all data should be free and available to all. Two major schools of thought, both of which recognize the 'public good' aspect of research activity, would argue for optimal data access under certain conditions. One school of thought, such as the United Kingdom Granting Council, would argue that bona-fide researchers should have privileged access to data; the other school of thought would argue that it should be the objective of the access, i.e. usage for research which will be in the public domain that should have privileged access. Both points of view have advantages and disadvantages. In the international arena, Canada will not be able to determine unilaterally which of these competing philosophies to support. However, it is imperative that Canada participate in the processes that eventually determine which model will become the international de facto basis for data access and dissemination.
In order to answer the questions posed by global change, Canada will need access to global data bases, and to ensure it, it is in our national interest to ensure that there is a strong data collection, data management and data preservation infrastructure in place to facilitate Canadian needs. This will require a commitment on the part of governments and organizations participating in global change programs to toward allocate scarce resources towards these ends.
There also must be recognition that Canadian needs for data collected by other countries will be matched by the needs of researchers in other countries for our data. In order to ensure access by the our research sector to foreign data, it will be vital that Canada facilitate access to Canadian data not only to its own research communities but also to extra-national and international research activities.
As well, because of Canada's strategic location in regards to global change, it will be host to a number of research programs related to global change...another reason for a high level of international participation.
Based on the review undertaken by the DISP the following set of recommendations were developed for discussion: