Practices, Challenges, and Prospects of Big Data Curation: a Case Study in Geoscience

Authors : Suzhen Chen, Bin Chen

Open and persistent access to past, present, and future scientific data is fundamental for transparent and reproducible data-driven research. The scientific community is now facing both challenges and opportunities caused by the growingly complex disciplinary data systems.

Concerted efforts from domain experts, information professionals, and Internet technology experts are essential to ensure the accessibility and interoperability of the big data.

Here we review current practices in building and managing big data within the context of large data infrastructure, using geoscience cyberinfrastructure such as Interdisciplinary Earth Data Alliance (IEDA) and EarthCube as a case study.

Geoscience is a data-rich discipline with a rapid expansion of sophisticated and diverse digital data sets. Having started to embrace the digital age, the community have applied big data and data mining tools into the new type of research.

We also identified current challenges, key elements, and prospects to construct a more robust and future-proof big data infrastructure for research and publication for the future, as well as the roles, qualifications, and opportunities for librarians/information professionals in the data era.

URL : Practices, Challenges, and Prospects of Big Data Curation: a Case Study in Geoscience

DOI: https://doi.org/10.2218/ijdc.v14i1.669

Formalizing Privacy Laws for License Generation and Data Repository Decision Automation

Authors : Micah Altman, Stephen Chong, Alexandra Wood

In this paper, we summarize work-in-progress on expert system support to automate some data deposit and release decisions within a data repository, and to generate custom license agreements for those data transfers.

Our approach formalizes via a logic programming language the privacy-relevant aspects of laws, regulations, and best practices, supported by legal analysis documented in legal memoranda.

This formalization enables automated reasoning about the conditions under which a repository can transfer data, through interrogation of users, and the application of formal rules to the facts obtained from users.

The proposed system takes the specific conditions for a given data release and produces a custom data use agreement that accurately captures the relevant restrictions on data use.

This enables appropriate decisions and accurate licenses, while removing the bottleneck of lawyer effort per data transfer.

The operation of the system aims to be transparent, in the sense that administrators, lawyers, institutional review boards, and other interested parties can evaluate the legal reasoning and interpretation embodied in the formalization, and the specific rationale for a decision to accept or release a particular dataset.

URL : https://arxiv.org/abs/1910.10096

Ouverture des données de la recherche : de la vision politique aux pratiques des chercheurs

Auteur/Author : Violaine Rebouillat

Cette thèse s’intéresse aux données de la recherche, dans un contexte d’incitation croissante à leur ouverture. Les données de la recherche sont des informations collectées par les scientifiques dans la perspective d’être utilisées comme preuves d’une théorie scientifique.

Il s’agit d’une notion complexe à définir, car contextuelle. Depuis les années 2000, le libre accès aux données occupe une place de plus en plus stratégique dans les politiques de recherche. Ces enjeux ont été relayés par des professions intermédiaires, qui ont développé des services dédiés, destinés à accompagner les chercheurs dans l’application des recommandations de gestion et d’ouverture.

La thèse interroge le lien entre idéologie de l’ouverture et pratiques de recherche. Quelles formes de gestion et de partage des données existent dans les communautés de recherche et par quoi sont-elles motivées ? Quelle place les chercheurs accordent-ils à l’offre de services issue des politiques de gestion et d’ouverture des données ?

Pour tenter d’y répondre, 57 entretiens ont été réalisés avec des chercheurs de l’Université de Strasbourg dans différentes disciplines. L’enquête révèle une très grande variété de pratiques de gestion et de partage de données. Un des points mis en évidence est que, dans la logique scientifique, le partage des données répond un besoin.

Il fait partie intégrante de la stratégie du chercheur, dont l’objectif est avant tout de préserver ses intérêts professionnels. Les données s’inscrivent donc dans un cycle de crédibilité, qui leur confère à la fois une valeur d’usage (pour la production de nouvelles publications) et une valeur d’échange (en tant que monnaie d’échange dans le cadre de collaborations avec des partenaires).

L’enquête montre également que les services développés dans un contexte d’ouverture des données correspondent pour une faible partie à ceux qu’utilisent les chercheurs.

L’une des hypothèses émises est que l’offre de services arrive trop tôt pour rencontrer les besoins des chercheurs. L’évaluation et la reconnaissance des activités scientifiques étant principalement fondées sur la publication d’articles et d’ouvrages, la gestion et l’ouverture des données ne sont pas considérées comme prioritaires par les chercheurs.

La seconde hypothèse avancée est que les services d’ouverture des données sont proposés par des acteurs relativement éloignés des communautés de recherche. Les chercheurs sont davantage influencés par des réseaux spécifiques à leurs champs de recherche (revues, infrastructures…).

Ces résultats invitent finalement à reconsidérer la question de la médiation dans l’ouverture des données scientifiques.

URL : https://tel.archives-ouvertes.fr/tel-02447653

Identifying and Implementing Relevant Research Data Management Services for the Library at the University of Dodoma, Tanzania

Authors : Gilbert Exaud Mushi, Heila Pienaar, Martie van Deventer

Research Data Management (RDM) services are increasingly becoming a subject of interest for academic and research libraries globally – this is also the case in developing countries.

The interest is motivated by a need to support research activities through data sharing and collaboration both locally and internationally. Many institutions, especially in the developed countries, have implemented RDM services to accelerate research and innovation through e-Research but extensive RDM is not so common in developing countries.

In reality many African universities and research institutions are yet to implement the most basic of data management services. We believe that the absence of political will and national government mandates on data management often hold back the development and implementation of RDM services. Similarly, research funding agencies are not yet applying sufficient pressure to ensure that Africa complies with the requirement to deposit research data in trusted repositories.

While the context was acknowledged the University of Dodoma library staff realized that it is urgent to prepare for the inevitable – the time when RDM will be a requirement for research funding support.

This paper presents the results of research conducted at the University of Dodoma, Tanzania. The purpose of the research was to identify and report on relevant RDM services that need to be implemented so that researchers and university management could collaborate and make our research data accessible to the international community.

This paper presents findings on important issues for consideration when planning to develop and implement RDM services at a developing country academic institution. The paper also mentions the requirements for the sustainability of these initiatives.

URL : Identifying and Implementing Relevant Research Data Management Services for the Library at the University of Dodoma, Tanzania

DOI : http://doi.org/10.5334/dsj-2020-001

Publishing computational research — A review of infrastructures for reproducible and transparent scholarly communication

Authors : Markus Konkol, Daniel Nüst, Laura Goulier

Funding agencies increasingly ask applicants to include data and software management plans into proposals. In addition, the author guidelines of scientific journals and conferences more often include a statement on data availability, and some reviewers reject unreproducible submissions.

This trend towards open science increases the pressure on authors to provide access to the source code and data underlying the computational results in their scientific papers.

Still, publishing reproducible articles is a demanding task and not achieved simply by providing access to code scripts and data files. Consequently, several projects develop solutions to support the publication of executable analyses alongside articles considering the needs of the aforementioned stakeholders.

The key contribution of this paper is a review of applications addressing the issue of publishing executable computational research results. We compare the approaches across properties relevant for the involved stakeholders, e.g., provided features and deployment options, and also critically discuss trends and limitations.

The review can support publishers to decide which system to integrate into their submission process, editors to recommend tools for researchers, and authors of scientific papers to adhere to reproducibility principles.

URL : https://arxiv.org/abs/2001.00484

Research Data Management in a Cultural Heritage Organisation

Author : Tom Drysdale

Research is a core function of cultural heritage organisations. Inevitably, the undertaking of research by galleries, libraries, archives and museums (the GLAM sector) leads to the creation of vast quantities of research data.

Yet despite growing recognition that research data must be managed if it is to be exploited effectively, and in spite of increasing understanding of research data management practices and needs, particularly in the higher education sector, knowledge of research data management in cultural heritage organisations remains extremely limited.

This paper represents an attempt to address the limited awareness of research data management in the cultural heritage sector. It presents the results of a data management audit conducted at Historic Royal Palaces (HRP) in 2018.

The study reveals that research data management at HRP is underdeveloped, while highlighting some causes for optimism.

The results of the study are compared to the results of similar studies conducted in UK higher education institutions (HEIs), highlighting the many discrepancies in the ways that research data is managed at HRP and in the HE sector.

Recognition of these differences and similarities, it is argued, is necessary for the development of better research data management practices and tools for the heritage sector.

URL : Research Data Management in a Cultural Heritage Organisation

DOI : https://doi.org/10.2218/ijdc.v14i1.647

Les données scientifiques face aux enjeux de la recherche en Sciences, Technologie et Médecine : enquête exploratoire à l’Université de Strasbourg

Auteur/Author : Violaine Rebouillat

Nous étudions la place des données scientifiques dans les pratiques de recherche à travers l’analyse de six projets du domaine des Sciences, Technologie, Médecine.

Il s’agit de questionner l’influence des stratégies de recherche sur la gestion et l’ouverture des données. Nous décrivons le rôle joué par la quête de reconnaissance par les pairs dans la recherche fondamentale et appliquée.

Nous montrons que les projets de recherche fondamentale tendent à suivre une logique, selon laquelle la publication d’articles dicte les priorités, tandis que les projets de recherche appliquée consacrent une attention plus grande aux données, en raison des enjeux économiques sous-jacents.

URL : https://hal-cnam.archives-ouvertes.fr/hal-02321077