Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier

Author : Christine L. Borgman

As universities recognize the inherent value in the data they collect and hold, they encounter unforeseen challenges in stewarding those data in ways that balance accountability, transparency, and protection of privacy, academic freedom, and intellectual property.

Two parallel developments in academic data collection are converging: (1) open access requirements, whereby researchers must provide access to their data as a condition of obtaining grant funding or publishing results in journals; and (2) the vast accumulation of ‘grey data’ about individuals in their daily activities of research, teaching, learning, services, and administration.

The boundaries between research and grey data are blurring, making it more difficult to assess the risks and responsibilities associated with any data collection. Many sets of data, both research and grey, fall outside privacy regulations such as HIPAA, FERPA, and PII.

Universities are exploiting these data for research, learning analytics, faculty evaluation, strategic decisions, and other sensitive matters. Commercial entities are besieging universities with requests for access to data or for partnerships to mine them.

The privacy frontier facing research universities spans open access practices, uses and misuses of data, public records requests, cyber risk, and curating data for privacy protection.

This paper explores the competing values inherent in data stewardship and makes recommendations for practice, drawing on the pioneering work of the University of California in privacy and information security, data governance, and cyber risk.


Developing indicators on Open Access by combining evidence from diverse data sources

Authors : Thed van Leeuwen, Ingeborg Meijer, Alfredo Yegros-Yegros, Rodrigo Costas

In the last couple of years, the role of Open Access (OA) publishing has become central in science management and research policy. In the UK and the Netherlands, national OA mandates require the scientific community to seriously consider publishing research outputs in OA forms.

At the same time, other elements of Open Science are becoming also part of the debate, thus including not only publishing research outputs but also other related aspects of the chain of scientific knowledge production such as open peer review and open data.

From a research management point of view, it is important to keep track of the progress made in the OA publishing debate. Until now, this has been quite problematic, given the fact that OA as a topic is hard to grasp by bibliometric methods, as most databases supporting bibliometric data lack exhaustive and accurate open access labelling of scientific publications.

In this study, we present a methodology that systematically creates OA labels for large sets of publications processed in the Web of Science database. The methodology is based on the combination of diverse data sources that provide evidence of publications being OA.


Open Data Protection : Study on legal barriers to open data sharing – Data Protection and PSI

Authors : Andreas Wiebe, Nils Dietrich

This study analyses legal barriers to data sharing in the context of the Open Research Data Pilot, which the European Commission is running within its research framework programme Horizon2020.

In the first part of the study, data protection issues are analysed. After a brief overview of the international basis for data protection, the European legal framework is described in detail.

The main focus is thus on the Data Protection Directive (95/46/EC), which has been in force since 1995. Not only is the Data Protection Directive itself described, but also its implementation in selected EU Member States.

Additionally, the upcoming General Data Protection Regulation (2016/679/EU) and relevant changes are described. Special focus is placed on leading data protection principles. Next, the study describes the use of research data in the Open Research Data Pilot and how data protection principles influence such use.

The experiences of the European Commission in running the Open Research Data Pilot so far, as well as basic examples of repository use forms, are considered. The second part of the study analyses the extent to which legislation on public sector information (PSI) influences access to and re-use of research data.

The Public Sector Information Directive (2003/98/EC) and the impact of its revision in 2013 (2013/37/EU) are described. There is a special focus on the application of PSI legislation to public libraries, including university and research libraries, and its practical implications.

In the final part of the study the results are critically evaluated and core recommendations are made to improve the legal situation in relation to research data.

URL : Open Data Protection : Study on legal barriers to open data sharing – Data Protection and PSI

Knowledge processes and information quality in open data context: conceptual considerations and empirical findings

Author : Matti Keränen

In this thesis, the knowledge processes of firms using open weather data and information from Finnish Meteorological Institute are studied. The goal is to describe and understand the knowledge processes and factors contributing to open data use, and at the same time, describe how information quality intertwines in these processes.

The theoretical framework builds on the knowledge management concept of absorptive capacity describing knowledge processes in firms. Explicit and tacit knowledge as well as practical knowledge and their different epistemological premises are noted in the framework.

As a third theoretical component, information quality is defined as both technical property of artifacts and a constructive concept of shared meaning between the data provider and user.

The research process included semi-structured interviews of five firms using open data and an abductive analysis of the empirical material. The outcome is a knowledge management based interpretation of the firms’ knowledge processes, contributing factors and information quality in the open data context.

Firms select different roles and thereby different knowledge domains when exploiting open data. The exploitation process is multidimensional including elements absorbed from the technical domain, weather information and local context.

The technical quality of information is defined dynamically in different phases of exploitation, while quality as a constructive concept is defined in the exploitation process where different knowledge domains intersect.


Entre libre accès et open data : quelle ouverture des données pour l’information sur les collections muséales ?

Auteur/Author : Laure-Hélène Kerrio

La littérature actuelle concernant l’information scientifique sur les collections muséales révèle une hétérogénéité des types d’informations et de supports, ainsi qu’une nature juridique complexe et contraignante qui régit sa communication et sa diffusion. Ces éléments modèlent les missions des professionnels de l’information-documentation qui la gèrent.

La gestion de cette information s’intègre aujourd’hui dans le mouvement des Communs des savoirs et des voies qui en sont issues, le libre accès et l’open data. Dans ce contexte, les musées français semblent peu développer l’ouverture des données.

Une enquête réalisée auprès de sept professionnels exerçant dans les musées toulousains montre leur positionnement par rapport à cet enjeu. Plutôt favorables à l’ouverture des données, ces professionnels pointent les difficultés et limites de telles voies tout en exprimant les conséquences de leur mise en œuvre sur leur identité professionnelle.


Towards Open Data for the Citation Content Analysis

Authors : Jose Manuel Barrueco, Thomas Krichel, Sergey Parinov, Victor Lyapunov, Oxana Medvedeva, Varvara Sergeeva

The paper presents first results of the CitEcCyr project funded by RANEPA. The project aims to create a source of open citation data for research papers written in Russian.

Compared to existing sources of citation data, CitEcCyr is working to provide the following added values: a) a transparent and distributed architecture of a technology that generates the citation data; b) an openness of all built/used software and created citation data; c) an extended set of citation data sufficient for the citation content analysis; d) services for public control over a quality of the citation data and a citing activity of researchers.


De l’open data à l’open science : retour réflexif sur les méthodes et pratiques d’une recherche sur les données géographiques

Auteurs/Authors : Nathalie Pinède, Matthieu Noucher, Françoise Gourmelon, Karel Soumagnac-Colin

Nous mobilisons ici l’expérience d’un projet de recherche en cours pour analyser la façon dont les nouveaux terrains d’expérimentations sur le web, modifient les conditions de la pratique scientifique, des objets aux méthodes, de l’open data à l’open science.

La massification des données géographiques disponibles sur le web reconfigure les dynamiques de recherche selon trois axes de transformation : les objets, les méthodes et les pratiques de recherche. Tout d’abord, nous soulignerons comment les enjeux de pouvoir autour de la cartographie se sont déplacés avec l’avènement du web et de l’open data.

Nous développerons ensuite les impacts en matière de méthodologie de recherche dans un contexte d’approche interdisciplinaire. Enfin, nous montrerons comment ce projet de recherche s’inscrit dans une démarche de type open science.