The Australian Research Data Commons

Authors : Michelle Barker, Ross Wilkinson, Andrew Treloar

A research data commons can provide researchers with the data and resources necessary to conduct world class research. More than this, a research data commons can be transformational in facilitating change in the way research is conducted, in terms of both research culture and the availability of research data and analytical tools.

This paper describes frameworks needed to build a transformational data commons, through examination of the development of the Australian Research Data Commons (ARDC) ARDC was formed in 2018 as part of a 20-year vision to transform Australia’s research culture by enabling access to the digital data and eResearch platforms that can significantly enhance research capacity.

ARDC is located within both national and international eResearch ecosystems, and its unique positioning must be understood, alongside the achievements of its three predecessor organisations, to understand the niche from which ARDC aims to provide maximum value and impact.

Consideration is given to the challenges inherent in both the current Australian ecosystem and beyond, to articulate ARDC’s focus going forward. The paper concludes with consideration of the international dimension, drawing on discussions around the development of a global data commons.

URL : The Australian Research Data Commons

DOI : http://doi.org/10.5334/dsj-2019-044

Workflows Allowing Creation of Journal Article Supporting Information and Findable, Accessible, Interoperable, and Reusable (FAIR)-Enabled Publication of Spectroscopic Data

Authors : Agustin Barba, Santiago Dominguez, Carlos Cobas, David P. Martinsen, Charles Romain, Henry S. Rzepa; Felipe Seoane

There is an increasing focus on the part of academic institutions, funding agencies, and publishers, if not researchers themselves, on preservation and sharing of research data. Motivations for sharing include research integrity, replicability, and reuse.

One of the barriers to publishing data is the extra work involved in preparing data for publication once a journal article and its supporting information have been completed.

In this work, a method is described to generate both human and machine-readable supporting information directly from the primary instrumental data files and to generate the metadata to ensure it is published in accordance with findable, accessible, interoperable, and reusable (FAIR) guidelines.

Using this approach, both the human readable supporting information and the primary (raw) data can be submitted simultaneously with little extra effort.

Although traditionally the data package would be sent to a journal publisher for publication alongside the article, the data package could also be published independently in an institutional FAIR data repository.

Workflows are described that store the data packages and generate metadata appropriate for such a repository. The methods both to generate and to publish the data packages have been implemented for NMR data, but the concept is extensible to other types of spectroscopic data as well.

URL : Workflows Allowing Creation of Journal Article Supporting Information and Findable, Accessible, Interoperable, and Reusable (FAIR)-Enabled Publication of Spectroscopic Data

DOI : https://doi.org/10.1021/acsomega.8b03005

Public Views on Models for Accessing Genomic and Health Data for Research: Mixed Methods Study

Authors : Kerina H Jones, Helen Daniels, Emma Squires, David V Ford

Background

The literature abounds with increasing numbers of research studies using genomic data in combination with health data (eg, health records and phenotypic and lifestyle data), with great potential for large-scale research and precision medicine.

However, concerns have been raised about social acceptability and risks posed for individuals and their kin. Although there has been public engagement on various aspects of this topic, there is a lack of information about public views on data access models.

Objective

This study aimed to address the lack of information on the social acceptability of access models for reusing genomic data collected for research in conjunction with health data.

Models considered were open web-based access, released externally to researchers, and access within a data safe haven.

Methods

Views were ascertained using a series of 8 public workshops (N=116). The workshops included an explanation of benefits and risks in using genomic data with health data, a facilitated discussion, and an exit questionnaire.

The resulting quantitative data were analyzed using descriptive and inferential statistics, and the qualitative data were analyzed for emerging themes.

Results

Respondents placed a high value on the reuse of genomic data but raised concerns including data misuse, information governance, and discrimination. They showed a preference for giving consent and use of data within a safe haven over external release or open access.

Perceived risks with open access included data being used by unscrupulous parties, with external release included data security, and with safe havens included the need for robust safeguards.

Conclusions: This is the first known study exploring public views of access models for reusing anonymized genomic and health data in research.

It indicated that people are generally amenable but prefer data safe havens because of perceived sensitivities. We recommend that public views be incorporated into guidance on models for the reuse of genomic and health data.

URL : Public Views on Models for Accessing Genomic and Health Data for Research: Mixed Methods Study

DOI : https://doi.org/10.2196/14384

Dataset search: a survey

Authors : Adriane Chapman, Elena Simperl, Laura Koesten, George Konstantinidis, Luis-Daniel Ibáñez, Emilia Kacprzak, Paul Groth

Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities.

Google recently beta-released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets.

Here, we survey the state of the art of research and commercial systems and discuss what makes dataset search a field in its own right, with unique challenges and open questions.

We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to tackle these questions as well as immediate next steps that will take the field forward.

URL : Dataset search: a survey

DOI : https://doi.org/10.1007/s00778-019-00564-x

On a Quest for Cultural Change – Surveying Research Data Management Practices at Delft University of Technology

Authors : Heather Andrews Mancilla, Marta Teperek, Jasper van Dijck, Kees den Heijer, Robbert Eggermont, Esther Plomp, Yasemin Turkyilmaz-van der Velden, Shalini Kurapati

The Data Stewardship project is a new initiative from the Delft University of Technology (TU Delft) in the Netherlands. Its aim is to create mature working practices and policies regarding research data management across all TU Delft faculties.

The novelty of this project relies on having a dedicated person, the so-called ‘Data Steward’, embedded in each faculty to approach research data management from a more discipline-specific perspective. It is within this framework that a research data management survey was carried out at the faculties that had a Data Steward in place by July 2018.

The goal was to get an overview of the general data management practices, and use its results as a benchmark for the project. The total response rate was 11 to 37% depending on the faculty.

Overall, the results show similar trends in all faculties, and indicate lack of awareness regarding different data management topics such as automatic data backups, data ownership, relevance of data management plans, awareness of FAIR data principles and usage of research data repositories.

The results also show great interest towards data management, as more than ~80% of the respondents in each faculty claimed to be interested in data management training and wished to see the summary of survey results.

Thus, the survey helped identified the topics the Data Stewardship project is currently focusing on, by carrying out awareness campaigns and providing training at both university and faculty levels.

URL : On a Quest for Cultural Change – Surveying Research Data Management Practices at Delft University of Technology

Skills, Standards, and Sapp Nelson’s Matrix: Evaluating Research Data Management Workshop Offerings

Authors : Philip Espinola Coombs, Christine Malinowski, Amy Nurnberger

Objective

To evaluate library workshops on their coverage of data management topics.

Methods

We used a modified version of Sapp Nelson’s Competency Matrix for Data Management Skills, a matrix of learning goals organized by data management competency and complexity level, against which we compared our educational materials: slide decks and worksheets.

We examined each of the educational materials against the 333 learning objectives in our modified version of the Matrix to determine which of the learning objectives applied.

Conclusions

We found it necessary to change certain elements of the Matrix’s structure to increase its clarity and functionality: reinterpreting the “behaviors,” shifting the organization from the three domains of Bloom’s taxonomy to increasing complexity solely within the cognitive domain, as well as creating a comprehensive identifier schema.

We appreciated the Matrix for its specificity of learning objectives, its organizational structure, the comprehensive range of competencies included, and its ease of use. On the whole, the Matrix is a useful instrument for the assessment of data management programming.

URL : Skills, Standards, and Sapp Nelson’s Matrix: Evaluating Research Data Management Workshop Offerings

Alternative location : https://escholarship.umassmed.edu/jeslib/vol8/iss1/6/

The Definition of Reuse

Authors : Stephanie van de Sandt, Sünje Dallmeier-Tiessen, Artemis Lavasa, Vivien Petras

The ability to reuse research data is now considered a key benefit for the wider research community. Researchers of all disciplines are confronted with the pressure to share their research data so that it can be reused.

The demand for data use and reuse has implications on how we document, publish and share research in the first place, and, perhaps most importantly, it affects how we measure the impact of research, which is commonly a measurement of its use and reuse.

It is surprising that research communities, policy makers, etc. have not clearly defined what use and reuse is yet.

We postulate that a clear definition of use and reuse is needed to establish better metrics for a comprehensive scholarly record of individuals, institutions, organizations, etc.

Hence, this article presents a first definition of reuse of research data. Characteristics of reuse are identified by examining the etymology of the term and the analysis of the current discourse, leading to a range of reuse scenarios that show the complexity of today’s research landscape, which has been moving towards a data-driven approach.

The analysis underlines that there is no reason to distinguish use and reuse. We discuss what that means for possible new metrics that attempt to cover Open Science practices more comprehensively.

We hope that the resulting definition will enable a better and more refined strategy for Open Science.

URL : The Definition of Reuse

DOI : http://doi.org/10.5334/dsj-2019-022