The Data Liberation Initiative: A New Cooperative Model1

Wendy Watkins, Carleton University Library2

ABSTRACT: Producing a numerate workforce is critical to productivity in an information society. If this is to be realized, we must move away from the notion that data files and databases are arcane concepts that are foreign to libraries. This paper outlines a proposal to put Canadian data and databases in the hands of the research and education community through a co-operative effort between libraries, government and the Depository Services Program and brings readers up to date on the progress of the initiative.

RÉSUMÉ: Dans une société fondée sur l'information, il est absolument essentiel de pouvoir produire une main-d'oeuvre qui sait faire l'analyse statistique. Si nous voulons y arriver, nous devons abandonner l'idée que les fichiers de données et les bases de données sont des concepts ésotériques qui n'ont pas leur place dans les bibliothèques. La présente communication trace les grandes lignes d'une proposition qui remettrait les données et les bases de données canadiennes entre les mains de gestionnaires du milieu de la recherche et de l'éducation au moyen d'un effort de collaboration entre les bibliothèques, le gouvernement et le Programme de services aux dépositaires.

Introduction

To be successful in a world that is becoming more and more dependent on information and knowledge, countries need to develop strategies to fully exploit the resources available to them. By the nature of their activities, governments are important sources of data and information, but the ability to exploit them resides not only in government. Private, academic and public sectors all require access to this wealth of information and data in order to realize their true potential. The Data Liberation Initiative is a proposal to provide access to these resources.

This initiative stems from a paper "Liberating the Data: A Proposal for a Joint Venture between Statistics Canada and Canadian Universities,"3 which suggests a co-operative scenario that would see Statistics Canada's data files made more easily available to the Canadian research community. The major premise of the paper is based on the fact that United States and international data are being used by the Canadian research community because Canadian data, while available, are either unknown or unaffordable. In both cases, they are inaccessible.

An excellent argument in favour of a more liberated scenario is advanced by Professor Paul Bernard, Chair, Advisory Committee on Social Conditions (Statistics Canada) and member of the National Statistics Council. It merits repetition here.

He writes: "...the genuine exercise of democracy increasingly requires that citizens get access to complex information and have the skills required to understand it." While he realizes there are pressures on Statistics Canada to reduce costs and increase income, he feels the outcome has been the restriction of "...access to information only to groups that have the solid ability to pay."4 Bernard feels that this may "...hamper the participation in public debates of groups whose contribution is not backed up by much money" as well as "those who have no prospect of turning a profit or reaping some tangible and relatively immediate benefit from using it." This, he states, is "...likely to lead, in the long run, to suboptimal development and less than full-blown democracy."5

This quote provided the impetus for the Data Liberation Initiative.

The Power of Data

Data are unlike other tools of the research endeavour. They provide the raw material from which information and knowledge can be created. By their nature, data allow for exploration of topics of interest to the researcher. Unlike printed tables which, like a postcard, provide a picture of one view of a larger phenomenon, data can act as a camera, allowing the researcher to manipulate the background, change the foreground and more fully investigate the object under study.

For example, only nine questions were asked of every Canadian in the 1991 Census. One might think that there would be a very limited amount of information that could be gleaned from a small number of variables. On the contrary; the number of tables that theoretically can be produced is enormous--over 350,000. Statistics Canada published thirteen. If only a fraction of these tables make "sense", there is still a tremendous gap between what was produced and what might be of concern. Thus, without access to the data, the researcher is left with a product that answers only the questions the information provider thinks are important, rather than addressing the problem under investigation.

And it is not just the enormity of the number of tables that may produce constraints. Decisions regarding what to produce are not made in a vacuum. Governments and other information providers are unlikely to produce information that would be critical of their own programs. Yet an informed policy debate requires that critical investigation be undertaken. Without access to data, it is unlikely that such a debate will be possible.

Producing a Numerate Workforce

As the "information society" evolves, it becomes increasingly important that countries produce a citizenry that is not only literate, but also numerate and able to think critically. We are increasingly faced with complex issues, surrounded by facts and figures, and have become more and more dependent upon other analysts for their clarification. But these analysts are often untrained in statistical analysis and depend on the interpretation of the information provider for their stories. Thus, we are left with a view which, while it may be objective, is not necessarily balanced.

In order to train Canadians, it is crucial that they be able to access Canadian data. Indeed, a complaint was voiced at a recent meeting of the National Statistics Council that recent graduates, while trained in the use of data, have little or no knowledge of Canadian statistical databases. In addition, with the help of better access, we need to improve training in these basic skills. At present we are facing a shortage of those who are able to train the next generation of researchers.

Bernard brings this point home when he argues:

"Concerning such issues, the public must have appropriate knowledge and not only hypothetical access to the data. Paradoxically, indeed, contemporary societies offer a wealth of information, but workers and citizens can be totally mystified, surrounded as they are by data whose flow and codes they do not master."6

While lack of accessible data is just one of the barriers, it is essential that it be overcome before we can make progress on this front. The Data Liberation Initiative is an attempt to overcome this hurdle.

History

In April 1993, after reviewing a copy of the "Liberation Paper", the Social Science Federation of Canada (SSFC) hosted a meeting of representatives from the Social Sciences and Humanities Research Council (SSHRC), the Association of Universities and Colleges of Canada (AUCC), the Canadian Association of Research Libraries (CARL), the Canadian Association of Public Data Users (CAPDU) and other interested parties to devise a strategy to make Canadian data more readily available to the education and research communities. The meeting resulted in the striking of a smaller working group, under the aegis of the SSFC, to devise a plan that would be acceptable to all parties. Statistics Canada and the Depository Services Program (DSP) played advisory roles in this process. While the initiative has involved government in an advisory role, it is unique in that it was conceived and developed by members of the Canadian research community.

The working group, consisting of researchers, a representative from CARL, a data intermediary and members of the SSFC held a series of meetings over the next months. Advice from both Statistics Canada and the Depository Services Program was invited, and found to be invaluable. When the group had formulated a working document, meetings were arranged with senior management from Statistics Canada and the Canada Communications Group (responsible for the DSP), seeking their approval for the project.

Once the document had been accepted in draft form by both parties, a series of information sessions were held with interested parties such as the National Library and Treasury Board. There was also a meeting of the larger group from whose ranks the working committee was formed. Finally, presentations were given to the Canadian Association of Public Data Users, the Canadian Library Association and Association of Canadian Map Libraries and Archives.

The Proposal7

The proposal suggests a five-year pilot agreement between research libraries, Statistics Canada and the DSP, using the previous agreements between Statistics Canada and CARL as the model. While all government departments should eventually be included, the proposal begins specifically with Statistics Canada as it is the major producer of products of this sort. And while all libraries should ultimately be able to take advantage of at least some of these products and delivery systems, the focus for this pilot is on publicly-funded research libraries.

In brief, this proposal suggests that:

Rationale

This proposal is aimed at increasing the flow of information from government to the public and in particular, the research and education community, and indirectly to all types of organizations in Canada. This will widen the base of those who can access this information and use it to enhance our nation's competitive edge by teaching students to make use of Canadian data. It represents an investment in training, in strengthening and encouraging research and development, and informing Canadians of their cultural and social identities and generally augmenting the concept of electronic democracy.

It has been a long-standing policy of the federal government under the aegis of the Depository Services Program, to make its information broadly available in a cost-effective manner. In 1991, after a review of the program, the Treasury Board itself directed the DSP to move beyond its print environment and to propose a method by which electronic products could be introduced into the program. This proposal is one response to that directive.

In today's fiscal environment it is critical that we seek creative solutions using partnerships, infrastructure and programs that are already in place. The full exploitation of electronic information requires a refinement and extension and modernization of these existing mechanisms. This will be accomplished by using the DSP network that has served Canada well in the past 65 years, and extending that concept to incorporate the new technologies, information products and information delivery systems.

Large, publicly-funded research organizations have often been at the leading edge in the development and use of these technologies. It is through the use of these tools for the on-going endeavours of teaching and research that knowledge is created and placed in the public domain. Unfortunately, because of current price structures, the data that are used for these purposes, tend to be from the United States. For a marginal, additional contribution, the Canadian government can greatly increase the return to its existing investment in Canadian data by creating an informed citizenry.

This approach is consistent with the government's philosophy that costs be shared by all partners and takes advantage of the government's drive to link into and link the country via the information highway.

Criteria for Inclusion--Institutions and Products

To be eligible to participate, institutions must be part of the depository program8 and have the technical capabilities and expertise to support the use of the products. Because many of the files are large and complex, participants must have access to sophisticated hardware and software and to personnel who are well- versed technically. They must also have access to a data and information professional.

Members will be encouraged to share expertise and costs of providing service with other member institutions who do not meet the full complement of requirements.

Three types of products will be included: public use microdata files, databases and geographic or spatial files. Public use microdata files are anonymized data from surveys and the Census of Population, which allow the researcher to explore relationships between variables at the level of the individual. Databases are aggregate or grouped data which have been organized for efficient computer-assisted access. Geographic files consist of digitized boundaries and networks.

All products are electronic, off-line, publicly available and affordable to the program. In addition, none of these products exists in paper format, as they are were created for statistical manipulation using a computer.

Current Situation

While this began as a researcher-driven project, it has recently become part of a Treasury Board initiative for redevelopment of the DSP. At the time of writing, a committee of stakeholders is being formed to review the current DSP and make recommendations to Treasury Board regarding the program's future. While a member from the Social Science Federation has been invited to participate, there will be no direct involvement of the end users of these kinds of electronic products, or what the Treasury Board has referred to as the "new clientele". We have, however, been assured that the Data Liberation Initiative will form part of this review.

Conclusions

This initiative has been welcomed by the research and education community, data intermediaries and by government stakeholders. The broader library community has been less enthusiastic, even though, in the wider context they are becoming more aware of their role as purveyors of information in all forms.

And there are benefits in participating in the initiative to be realized by libraries of all stripes. Increasingly, paper products are migrating to electronic format and falling out of the DSP as a result. The initiative will provide a means of taking a leading role in the information economy and the current explosion of information and technology. It will allow institutions with limited resources and expertise in the area of data to form partnerships with those who have an established infrastructure, and provide services not now available on-site. Finally, through the use of the electronic highway, it will enable the development of centres of specialization that can be linked across the country.

As Alan MacDonald, Director of Information Services, The University of Calgary, in an address to CAPDU members said about expanding the role of libraries:

"We who labour in the information services and who believe in the idea of library, now stand at the edge of a new era. We must go to where the hornets live. We must move from the rhythms and pleasures of cross-country directly to the fears and excitements of heli-skiing. We are not alone, but we do not have the choice not to act. If we fail to act wisely, the utility of the academy to the new century will be seriously undermined."9

We must move away from the notion that data files and databases are arcane concepts. While is may not be easy to explain the notion of data and why they are important, it is critical that someone with this understanding has a voice in the recommendations to Treasury Board regarding the restructuring of the Depository Services Program. Without such an understanding, data will continue to be treated as less worthy of inclusion in the program and electronic products will remain overwhelmingly bibliographic in nature.

That said, there seems to be a genuine desire to make this proposal a reality. The only unpredictable element is government finances. Library support will be essential in convincing government that, despite fiscal constraints, this project is integral to Canadians' productivity in an information society.


Letters to the Editor / Lettres au rédacteur en chef