Attitudes and norms affecting scientists’ data reuse

Authors : Renata Gonçalves Curty, Kevin Crowston, Alison Specht, Bruce W. Grant, Elizabeth D. Dalton

The value of sharing scientific research data is widely appreciated, but factors that hinder or prompt the reuse of data remain poorly understood. Using the Theory of Reasoned Action, we test the relationship between the beliefs and attitudes of scientists towards data reuse, and their self-reported data reuse behaviour.

To do so, we used existing responses to selected questions from a worldwide survey of scientists developed and administered by the DataONE Usability and Assessment Working Group (thus practicing data reuse ourselves).

Results show that the perceived efficacy and efficiency of data reuse are strong predictors of reuse behaviour, and that the perceived importance of data reuse corresponds to greater reuse. Expressed lack of trust in existing data and perceived norms against data reuse were not found to be major impediments for reuse contrary to our expectations.

We found that reported use of models and remotely-sensed data was associated with greater reuse. The results suggest that data reuse would be encouraged and normalized by demonstration of its value.

We offer some theoretical and practical suggestions that could help to legitimize investment and policies in favor of data sharing.

URL : Attitudes and norms affecting scientists’ data reuse

DOI : https://doi.org/10.1371/journal.pone.0189288

Data-Sprinting: a Public Approach to Digital Research

Authors : Tommaso Venturini, Anders Munk, Axel Meunier

This chapter is about the politics of interdisciplinarity. Not in the sense of the research politics fostering collaboration across disciplines, but in the stronger sense of transcending disciplinary boundaries to make significant political contributions.

In short: it is about making research public. To address this question, this chapter introduces (through a concrete example in climate debate research) an original research format, that we call data-sprinting.

URL : https://hal.archives-ouvertes.fr/hal-01672288

Evaluating the Effectiveness of Data Management Training: DataONE’s Survey Instrument

Authors : Chung-Yi Hou, Heather Soyka, Vivian Hutchison, Isis Sema, Chris Allen, Amber Budden

Effective management is a key component for preparing data to be retained for future long term access, use, and reuse by a broader community. Developing the skills to plan and perform data management tasks is important for individuals and institutions.

Teaching data literacy skills may also help to mitigate the impact of data deluge and other effects of being overexposed to and overwhelmed by data.

The process of learning how to manage data effectively for the entire research data lifecycle can be complex. There are often multiple stages involved within a lifecycle for managing data, and each stage may require specific knowledge, expertise, and resources.

Additionally, although a range of organizations offers data management education and training resources, it can often be difficult to assess how effective the resources are for educating users to meet their data management requirements.

In the case of Data Observation Network for Earth (DataONE), DataONE’s extensive collaboration with individuals and organizations has informed the development of multiple educational resources. Through these interactions, DataONE understands that the process of creating and maintaining educational materials that remain responsive to community needs is reliant on careful evaluations.

Therefore, the impetus for a comprehensive, customizable Education EVAluation instrument (EEVA) is grounded in the need for tools to assess and improve current and future training and educational resources for research data management.

In this paper, the authors outline and provide context for the background and motivations that led to creating EEVA for evaluating the effectiveness of data management educational resources. The paper details the process and results of the current version of EEVA.

Finally, the paper highlights the key features, potential uses, and the next steps in order to improve future extensions and revisions of EEVA.

URL : Evaluating the Effectiveness of Data Management Training: DataONE’s Survey Instrument

DOI : https://doi.org/10.2218/ijdc.v12i2.508

Data Sharing: Convert Challenges into Opportunities

Author : Ana Sofia Figueiredo

Initiatives for sharing research data are opportunities to increase the pace of knowledge discovery and scientific progress. The reuse of research data has the potential to avoid the duplication of data sets and to bring new views from multiple analysis of the same data set.

For example, the study of genomic variations associated with cancer profits from the universal collection of such data and helps in selecting the most appropriate therapy for a specific patient. However, data sharing poses challenges to the scientific community.

These challenges are of ethical, cultural, legal, financial, or technical nature. This article reviews the impact that data sharing has in science and society and presents guidelines to improve the efficient sharing of research data.

URL : Data Sharing: Convert Challenges into Opportunities

DOI : https://doi.org/10.3389/fpubh.2017.00327

Balancing the local and the universal in maintaining ethical access to a genomics biobank

Authors : Catherine Heeney, Shona M. Kerr

Background

Issues of balancing data accessibility with ethical considerations and governance of a genomics research biobank, Generation Scotland, are explored within the evolving policy landscape of the past ten years. During this time data sharing and open data access have become increasingly important topics in biomedical research.

Decisions around data access are influenced by local arrangements for governance and practices such as linkage to health records, and the global through policies for biobanking and the sharing of data with large-scale biomedical research data resources and consortia.

Methods

We use a literature review of policy relevant documents which apply to the conduct of biobanks in two areas: support for open access and the protection of data subjects and researchers managing a bioresource.

We present examples of decision making within a biobank based upon observations of the Generation Scotland Access Committee. We reflect upon how the drive towards open access raises ethical dilemmas for established biorepositories containing data and samples from human subjects.

Results

Despite much discussion in science policy literature about standardisation, the contextual aspects of biobanking are often overlooked. Using our engagement with GS we demonstrate the importance of local arrangements in the creation of a responsive ethical approach to biorepository governance.

We argue that governance decisions regarding access to the biobank are intertwined with considerations about maintenance and viability at the local level. We show that in addition to the focus upon ever more universal and standardised practices, the local expertise gained in the management of such repositories must be supported.

Conclusions

A commitment to open access in genomics research has found almost universal backing in science and health policy circles, but repositories of data and samples from human subjects may have to operate under managed access, to protect privacy, align with participant consent and ensure that the resource can be managed in a sustainable way.

Data access committees need to be reflexive and flexible, to cope with changing technology and opportunities and threats from the wider data sharing environment. To understand these interactions also involves nurturing what is particular about the biobank in its local context.

URL : Balancing the local and the universal in maintaining ethical access to a genomics biobank

DOI : https://doi.org/10.1186/s12910-017-0240-7

Assessing Research Data Deposits and Usage Statistics within IDEALS

Author : Christie A. Wiley

Objectives

This study follows up on previous work that began examining data deposited in an institutional repository. The work here extends the earlier study by answering the following lines of research questions: (1) What is the file composition of datasets ingested into the University of Illinois at Urbana-Champaign (UIUC) campus repository? Are datasets more likely to be single-file or multiple-file items? (2) What is the usage data associated with these datasets? Which items are most popular?

Methods

The dataset records collected in this study were identified by filtering item types categorized as “data” or “dataset” using the advanced search function in IDEALS. Returned search results were collected in an Excel spreadsheet to include data such as the Handle identifier, date ingested, file formats, composition code, and the download count from the item’s statistics report.

The Handle identifier represents the dataset record’s persistent identifier. Composition represents codes that categorize items as single or multiple file deposits. Date available represents the date the dataset record was published in the campus repository.

Download statistics were collected via a website link for each dataset record and indicates the number of times the dataset record has been downloaded. Once the data was collected, it was used to evaluate datasets deposited into IDEALS.

Results

A total of 522 datasets were identified for analysis covering the period between January 2007 and August 2016. This study revealed two influxes occurring during the period of 2008-2009 and in 2014. During the first timeframe a large number of PDFs were deposited by the Illinois Department of Agriculture.

Whereas, Microsoft Excel files were deposited in 2014 by the Rare Books and Manuscript Library. Single-file datasets clearly dominate the deposits in the campus repository. The total download count for all datasets was 139,663 and the average downloads per month per file across all datasets averaged 3.2.

Conclusion

Academic librarians, repository managers, and research data services staff can use the results presented here to anticipate the nature of research data that may be deposited within institutional repositories.

With increased awareness, content recruitment, and improvements, IRs can provide a viable cyberinfrastructure for researchers to deposit data, but much can be learned from the data already deposited.

Awareness of trends can help librarians facilitate discussions with researchers about research data deposits as well as better tailor their services to address short-term and long-term research needs.

URL : Assessing Research Data Deposits and Usage Statistics within IDEALS

DOI : https://doi.org/10.7191/jeslib.2017.1112

Documentation and Visualisation of Workflows for Effective Communication, Collaboration and Publication @ Source

Authors : Cerys Willoughby, Jeremy G. Frey

Workflows processing data from research activities and driving in silico experiments are becoming an increasingly important method for conducting scientific research. Workflows have the advantage that not only can they be automated and used to process data repeatedly, but they can also be reused – in part or whole – enabling them to be evolved for use in new experiments.

A number of studies have investigated strategies for storing and sharing workflows for the benefit of reuse. These have revealed that simply storing workflows in repositories without additional context does not enable workflows to be successfully reused.

These studies have investigated what additional resources are needed to facilitate users of workflows and in particular to add provenance traces and to make workflows and their resources machine-readable.

These additions also include adding metadata for curation, annotations for comprehension, and including data sets to provide additional context to the workflow. Ultimately though, these mechanisms still rely on researchers having access to the software to view and run the workflows.

We argue that there are situations where researchers may want to understand a workflow that goes beyond what provenance traces provide and without having to run the workflow directly; there are many situations in which it can be difficult or impossible to run the original workflow.

To that end, we have investigated the creation of an interactive workflow visualization that captures the flow chart element of the workflow with additional context including annotations, descriptions, parameters, metadata and input, intermediate, and results data that can be added to the record of a workflow experiment to enhance both curation and add value to enable reuse.

We have created interactive workflow visualisations for the popular workflow creation tool KNIME, which does not provide users with an in-built function to extract provenance information that can otherwise only be viewed through the tool itself.

Making use of the strengths of KNIME for adding documentation and user-defined metadata we can extract and create a visualisation and curation package that encourages and enhances curation@source, facilitating effective communication, collaboration, and reuse of workflows.

URL : Documentation and Visualisation of Workflows for Effective Communication, Collaboration and Publication @ Source

DOI : https://doi.org/10.2218/ijdc.v12i1.532