A Data-Driven Approach to Appraisal and Selection at a Domain Data Repository

Authors : Amy M Pienta, Dharma Akmon, Justin Noble, Lynette Hoelter, Susan Jekielek

Social scientists are producing an ever-expanding volume of data, leading to questions about appraisal and selection of content given finite resources to process data for reuse. We analyze users’ search activity in an established social science data repository to better understand demand for data and more effectively guide collection development.

By applying a data-driven approach, we aim to ensure curation resources are applied to make the most valuable data findable, understandable, accessible, and usable. We analyze data from a domain repository for the social sciences that includes over 500,000 annual searches in 2014 and 2015 to better understand trends in user search behavior.

Using a newly created search-to-study ratio technique, we identified gaps in the domain data repository’s holdings and leveraged this analysis to inform our collection and curation practices and policies.

The evaluative technique we propose in this paper will serve as a baseline for future studies looking at trends in user demand over time at the domain data repository being studied with broader implications for other data repositories.

A Research Graph dataset for connecting research data repositories using RD-Switchboard

Authors : Amir Aryani, Marta Poblet, Kathryn Unsworth, Jingbo Wang, Ben Evans, Anusuriya Devaraju, Brigitte Hausstein, Claus-Peter Klas, Benjamin Zapilko, Samuele Kaplun

This paper describes the open access graph dataset that shows the connections between Dryad, CERN, ANDS and other international data repositories to publications and grants across multiple research data infrastructures.

The graph dataset was created using the Research Graph data model and the Research Data Switchboard (RD-Switchboard), a collaborative project by the Research Data Alliance DDRI Working Group (DDRI WG) with the aim to discover and connect the related research datasets based on publication co-authorship or jointly funded grants.

The graph dataset allows researchers to trace and follow the paths to understanding a body of work. By mapping the links between research datasets and related resources, the graph dataset improves both their discovery and visibility, while avoiding duplicate efforts in data creation.

Ultimately, the linked datasets may spur novel ideas, facilitate reproducibility and re-use in new applications, stimulate combinatorial creativity, and foster collaborations across institutions.

Curating Humanities Research Data: Managing Workflows for Adjusting a Repository Framework

Author : Hagen Peukert

Handling heterogeneous data, subject to minimal costs, can be perceived as a classic management problem. The approach at hand applies established managerial theorizing to the field of data curation.

It is argued, however, that data curation cannot merely be treated as a standard case of applying management theory in a traditional sense. Rather, the practice of curating humanities research data, the specifications and adjustments of the model suggested here reveal an intertwined process, in which knowledge of both strategic management and solid information technology have to be considered.

Thus, suggestions on the strategic positioning of research data, which can be used as an analytical tool to understand the proposed workflow mechanisms, and the definition of workflow modules, which can be flexibly used in designing new standard workflows to configure research data repositories, are put forward.

Are the FAIR Data Principles Fair?

Authors : Alastair Dunning, Madeleine de Smaele, Jasmin Böhmer

This practice paper describes an ongoing research project to test the effectiveness and relevance of the FAIR Data Principles. Simultaneously, it will analyse how easy it is for data archives to adhere to the principles. The research took place from November 2016 to January 2017, and will be underpinned with feedback from the repositories.

The FAIR Data Principles feature 15 facets corresponding to the four letters of FAIR – Findable, Accessible, Interoperable, Reusable. These principles have already gained traction within the research world.

The European Commission has recently expanded its demand for research to produce open data. The relevant guidelines1are explicitly written in the context of the FAIR Data Principles. Given an increasing number of researchers will have exposure to the guidelines, understanding their viability and suggesting where there may be room for modification and adjustment is of vital importance.

This practice paper is connected to a dataset (Dunning et al.,2017) containing the original overview of the sample group statistics and graphs, in an Excel spreadsheet. Over the course of two months, the web-interfaces, help-pages and metadata-records of over 40 data repositories have been examined, to score the individual data repository against the FAIR principles and facets.

The traffic-light rating system enables colour-coding according to compliance and vagueness. The statistical analysis provides overall, categorised, on the principles focussing, and on the facet focussing results.

The analysis includes the statistical and descriptive evaluation, followed by elaborations on Elements of the FAIR Data Principles, the subject specific or repository specific differences, and subsequently what repositories can do to improve their information architecture.

Integration of an Active Research Data System with a Data Repository to Streamline the Research Data Lifecyle: Pure-NOMAD Case Study

Authors : Simone Ivan Conte, Federica Fina, Michalis Psalios, Shyam Ryal, Tomas Lebl, Anna Clements

Research funders have introduced requirements that expect researchers to properly manage and publicly share their research data, and expect institutions to put in place services to support researchers in meeting these requirements.

So far the general focus of these services and systems has been on addressing the final stages of the research data lifecycle (archive, share and re-use), rather than stages related to the active phase of the cycle (collect/create and analyse).

As a result, full integration of active data management systems with data repositories is not yet the norm, making the streamlined transition of data from an active to a published and archived status an important challenge.

In this paper we present the integration between an active data management system developed in-house (NOMAD) and Elsevier’s Pure data repository used at our institution, with the aim of offering a simple workflow to facilitate and promote the data deposit process.

The integration results in a new data management and publication workflow that helps researchers to save time, minimize human errors related to manually handling files, and further promote data deposit together with collaboration across the institution.

Librarians’ Perspectives on the Factors Influencing Research Data Management Programs

Authors: Ixchel M. Faniel, Lynn Silipigni Connaway

This qualitative research study examines librarians’ research data management (RDM) experiences, specifically the factors that influence their ability to support researchers’ needs.

Findings from interviews with 36 academic library professionals in the United States identify 5 factors of influence: 1) technical resources; 2) human resources; 3) researchers’ perceptions about the library; 4) leadership support; and 5) communication, coordination, and collaboration. Findings show different aspects of these factors facilitate or constrain RDM activity. The implications of these factors on librarians’ continued work in RDM are considered.

Creating a Community of Data Champions

Authors : Rosie Higman, Marta Teperek, Danny Kingsley

Research Data Management (RDM) presents an unusual challenge for service providers in Higher Education. There is increased awareness of the need for training in this area but the nature of the discipline-specific practices involved make it difficult to provide training across a multi-disciplinary organisation.

Whilst most UK universities now have a research data team of some description, they are often small and rarely have the resources necessary to provide targeted training to the different disciplines and research career stages that they are increasingly expected to support.

This practice paper describes the approach taken at the University of Cambridge to address this problem by creating a community of Data Champions. This collaborative initiative, working with researchers to provide training and advocacy for good RDM practice, allows for more discipline-specific training to be given, researchers to be credited for their expertise and creates an opportunity for those interested in RDM to exchange knowledge with others.

The ‘community of practice’ model has been used in many sectors, including Higher Education, to facilitate collaboration across organisational units and this initiative will adopt some of the same principles to improve communication across a decentralised institution.

The Data Champions initiative at Cambridge was launched in September 2016 and this paper reports on the early months, plans for building the community in the future and the possible risks associated with this approach to providing RDM services.

