RÉSUMÉ: Dans une société fondée sur l'information, il est absolument essentiel de pouvoir produire une main-d'oeuvre qui sait faire l'analyse statistique. Si nous voulons y arriver, nous devons abandonner l'idée que les fichiers de données et les bases de données sont des concepts ésotériques qui n'ont pas leur place dans les bibliothèques. La présente communication trace les grandes lignes d'une proposition qui remettrait les données et les bases de données canadiennes entre les mains de gestionnaires du milieu de la recherche et de l'éducation au moyen d'un effort de collaboration entre les bibliothèques, le gouvernement et le Programme de services aux dépositaires.
This initiative stems from a paper "Liberating the Data: A Proposal for a Joint Venture between Statistics Canada and Canadian Universities,"3 which suggests a co-operative scenario that would see Statistics Canada's data files made more easily available to the Canadian research community. The major premise of the paper is based on the fact that United States and international data are being used by the Canadian research community because Canadian data, while available, are either unknown or unaffordable. In both cases, they are inaccessible.
An excellent argument in favour of a more liberated scenario is advanced by Professor Paul Bernard, Chair, Advisory Committee on Social Conditions (Statistics Canada) and member of the National Statistics Council. It merits repetition here.
He writes: "...the genuine exercise of democracy increasingly requires that citizens get access to complex information and have the skills required to understand it." While he realizes there are pressures on Statistics Canada to reduce costs and increase income, he feels the outcome has been the restriction of "...access to information only to groups that have the solid ability to pay."4 Bernard feels that this may "...hamper the participation in public debates of groups whose contribution is not backed up by much money" as well as "those who have no prospect of turning a profit or reaping some tangible and relatively immediate benefit from using it." This, he states, is "...likely to lead, in the long run, to suboptimal development and less than full-blown democracy."5
This quote provided the impetus for the Data Liberation Initiative.
For example, only nine questions were asked of every Canadian in the 1991 Census. One might think that there would be a very limited amount of information that could be gleaned from a small number of variables. On the contrary; the number of tables that theoretically can be produced is enormous--over 350,000. Statistics Canada published thirteen. If only a fraction of these tables make "sense", there is still a tremendous gap between what was produced and what might be of concern. Thus, without access to the data, the researcher is left with a product that answers only the questions the information provider thinks are important, rather than addressing the problem under investigation.
And it is not just the enormity of the number of tables that may produce constraints. Decisions regarding what to produce are not made in a vacuum. Governments and other information providers are unlikely to produce information that would be critical of their own programs. Yet an informed policy debate requires that critical investigation be undertaken. Without access to data, it is unlikely that such a debate will be possible.
In order to train Canadians, it is crucial that they be able to access Canadian data. Indeed, a complaint was voiced at a recent meeting of the National Statistics Council that recent graduates, while trained in the use of data, have little or no knowledge of Canadian statistical databases. In addition, with the help of better access, we need to improve training in these basic skills. At present we are facing a shortage of those who are able to train the next generation of researchers.
Bernard brings this point home when he argues:
"Concerning such issues, the public must have appropriate knowledge and not only hypothetical access to the data. Paradoxically, indeed, contemporary societies offer a wealth of information, but workers and citizens can be totally mystified, surrounded as they are by data whose flow and codes they do not master."6
While lack of accessible data is just one of the barriers, it is essential that it be overcome before we can make progress on this front. The Data Liberation Initiative is an attempt to overcome this hurdle.
The working group, consisting of researchers, a representative from CARL, a data intermediary and members of the SSFC held a series of meetings over the next months. Advice from both Statistics Canada and the Depository Services Program was invited, and found to be invaluable. When the group had formulated a working document, meetings were arranged with senior management from Statistics Canada and the Canada Communications Group (responsible for the DSP), seeking their approval for the project.
Once the document had been accepted in draft form by both parties, a series of information sessions were held with interested parties such as the National Library and Treasury Board. There was also a meeting of the larger group from whose ranks the working committee was formed. Finally, presentations were given to the Canadian Association of Public Data Users, the Canadian Library Association and Association of Canadian Map Libraries and Archives.
In brief, this proposal suggests that:
It has been a long-standing policy of the federal government under the aegis of the Depository Services Program, to make its information broadly available in a cost-effective manner. In 1991, after a review of the program, the Treasury Board itself directed the DSP to move beyond its print environment and to propose a method by which electronic products could be introduced into the program. This proposal is one response to that directive.
In today's fiscal environment it is critical that we seek creative solutions using partnerships, infrastructure and programs that are already in place. The full exploitation of electronic information requires a refinement and extension and modernization of these existing mechanisms. This will be accomplished by using the DSP network that has served Canada well in the past 65 years, and extending that concept to incorporate the new technologies, information products and information delivery systems.
Large, publicly-funded research organizations have often been at the leading edge in the development and use of these technologies. It is through the use of these tools for the on-going endeavours of teaching and research that knowledge is created and placed in the public domain. Unfortunately, because of current price structures, the data that are used for these purposes, tend to be from the United States. For a marginal, additional contribution, the Canadian government can greatly increase the return to its existing investment in Canadian data by creating an informed citizenry.
This approach is consistent with the government's philosophy that costs be shared by all partners and takes advantage of the government's drive to link into and link the country via the information highway.
Members will be encouraged to share expertise and costs of providing service with other member institutions who do not meet the full complement of requirements.
Three types of products will be included: public use microdata files, databases and geographic or spatial files. Public use microdata files are anonymized data from surveys and the Census of Population, which allow the researcher to explore relationships between variables at the level of the individual. Databases are aggregate or grouped data which have been organized for efficient computer-assisted access. Geographic files consist of digitized boundaries and networks.
All products are electronic, off-line, publicly available and affordable to the program. In addition, none of these products exists in paper format, as they are were created for statistical manipulation using a computer.
And there are benefits in participating in the initiative to be realized by libraries of all stripes. Increasingly, paper products are migrating to electronic format and falling out of the DSP as a result. The initiative will provide a means of taking a leading role in the information economy and the current explosion of information and technology. It will allow institutions with limited resources and expertise in the area of data to form partnerships with those who have an established infrastructure, and provide services not now available on-site. Finally, through the use of the electronic highway, it will enable the development of centres of specialization that can be linked across the country.
As Alan MacDonald, Director of Information Services, The University of Calgary, in an address to CAPDU members said about expanding the role of libraries:
"We who labour in the information services and who believe in the idea of library, now stand at the edge of a new era. We must go to where the hornets live. We must move from the rhythms and pleasures of cross-country directly to the fears and excitements of heli-skiing. We are not alone, but we do not have the choice not to act. If we fail to act wisely, the utility of the academy to the new century will be seriously undermined."9
We must move away from the notion that data files and databases are arcane concepts. While is may not be easy to explain the notion of data and why they are important, it is critical that someone with this understanding has a voice in the recommendations to Treasury Board regarding the restructuring of the Depository Services Program. Without such an understanding, data will continue to be treated as less worthy of inclusion in the program and electronic products will remain overwhelmingly bibliographic in nature.
That said, there seems to be a genuine desire to make this proposal a reality. The only unpredictable element is government finances. Library support will be essential in convincing government that, despite fiscal constraints, this project is integral to Canadians' productivity in an information society.