Seeing oneself as a data reuser: How subjectification activates the drivers of data reuse in science

Authors : Marcel LaFlamme, Marion Poetz, Daniel Spichtinger

Considerable resources are being invested in strategies to facilitate the sharing of data across domains, with the aim of addressing inefficiencies and biases in scientific research and unlocking potential for science-based innovation.

Still, we know too little about what determines whether scientific researchers actually make use of the unprecedented volume of data being shared. This study characterizes the factors influencing researcher data reuse in terms of their relationship to a specific research project, and introduces subjectification as the mechanism by which these influencing factors are activated.

Based on our analysis of semi-structured interviews with a purposive sample of 24 data reusers and intermediaries, we find that while both project-independent and project-dependent factors may have a direct effect on a single instance of data reuse, they have an indirect effect on recurring data reuse as mediated by subjectification.

We integrate our findings into a model of recurring data reuse behavior that presents subjectification as the mechanism by which influencing factors are activated in a propensity to engage in data reuse.

Our findings hold scientific implications for the theorization of researcher data reuse, as well as practical implications around the role of settings for subjectification in bringing about and sustaining changes in researcher behavior.

URL : Seeing oneself as a data reuser: How subjectification activates the drivers of data reuse in science

DOI : https://doi.org/10.1371/journal.pone.0272153

Open research data: A case study into institutional and infrastructural arrangements to stimulate open research data sharing and reuse

Authors : Thijmen van Gend, Anneke Zuiderwijk

This study investigates which combination of institutional and infrastructural arrangements positively impact research data sharing and reuse in a specific case. We conducted a qualitative case study of the institutional and infrastructural arrangements implemented at Delft University of Technology in the Netherlands.

In the examined case, it was fundamental to change the mindset of researchers and to make them aware of the benefits of sharing data. Therefore, arrangements should be designed bottom-up and used as a “carrot” rather than as a “stick.” Moreover, support offered to researchers should cover at least legal, financial, administrative, and practical issues of research data management and should be informal in nature.

Previous research describes generic institutional and infrastructural instruments that can stimulate open research data sharing and reuse. This study is among the first to analyze what and how infrastructural and institutional arrangements work in a particular context. It provides the basis for other scholars to study such arrangements in different contexts.

Open data policymakers, universities, and open data infrastructure providers can use our findings to stimulate data sharing and reuse in practice, adapted to the contextual situation. Our study focused on a single case and a particular part of the university.

We recommend repeating this research in other contexts, that is, at other universities, faculties, and involving other research data infrastructure providers.

URL : Open research data: A case study into institutional and infrastructural arrangements to stimulate open research data sharing and reuse

DOI : https://doi.org/10.1177/09610006221101200

The Role of Data in an Emerging Research Community : Environmental Health Research as an Exemplar

Authors : Danielle Polloc, An Yan, Michelle Parker, Suzie Allard

Open science data benefit society by facilitating convergence across domains that are examining the same scientific problem. While cross-disciplinary data sharing and reuse is essential to the research done by convergent communities, so far little is known about the role data play in how these communities interact.

An understanding of the role of data in these collaborations can help us identify and meet the needs of emerging research communities which may predict the next challenges faced by science. This paper represents an exploratory study of one emerging community, the environmental health community, examining how environmental health research groups form, collaborate, and share data.

Five key insights about the role of data in emerging research communities are identified and suggestions are made for further research.

URL : The Role of Data in an Emerging Research Community : Environmental Health Research as an Exemplar

DOI : https://doi.org/10.2218/ijdc.v16i1.653

Integrative data reuse at scientifically significant sites: Case studies at Yellowstone National Park and the La Brea Tar Pits

Author : Andrea K. Thomer

Scientifically significant sites are the source of, and long-term repository for, considerable amounts of data—particularly in the natural sciences. However, the unique data practices of the researchers and resource managers at these sites have been relatively understudied.

Through case studies of two scientifically significant sites (the hot springs at Yellowstone National Park and the fossil deposits at the La Brea Tar Pits), I developed rich descriptions of site-based research and data curation, and high-level data models of information classes needed to support integrative data reuse.

Each framework treats the geospatial site and its changing natural characteristics as a distinct class of information; more commonly considered information classes such as observational and sampling data, and project metadata, are defined in relation to the site itself.

This work contributes (a) case studies of the values and data needs for researchers and resource managers at scientifically significant sites, (b) an information framework to support integrative reuse at these sites, and (c) a discussion of data practices at scientifically significant sites.

URL : Integrative data reuse at scientifically significant sites: Case studies at Yellowstone National Park and the La Brea Tar Pits

DOI : https://doi.org/10.1002/asi.24620

An overview of biomedical platforms for managing research data

Authors : Vivek Navale, Denis von Kaeppler, Matthew McAuliffe

Biomedical platforms provide the hardware and software to securely ingest, process, validate, curate, store, and share data. Many large-scale biomedical platforms use secure cloud computing technology for analyzing, integrating, and storing phenotypic, clinical, and genomic data. Several web-based platforms are available for researchers to access services and tools for biomedical research.

The use of bio-containers can facilitate the integration of bioinformatics software with various data analysis pipelines. Adoption of Common Data Models, Common Data Elements, and Ontologies can increase the likelihood of data reuse. Managing biomedical Big Data will require the development of strategies that can efficiently leverage public cloud computing resources.

The use of the research community developed standards for data collection can foster the development of machine learning methods for data processing and analysis. Increasingly platforms will need to support the integration of data from multiple disease area research.

URL : An overview of biomedical platforms for managing research data

DOI : https://doi.org/10.1007/s42488-020-00040-0

Evaluation of Data Sharing After Implementation of the International Committee of Medical Journal Editors Data Sharing Statement Requirement

Authors : Valentin Danchev, Yan Min, John Borghi, Mike Baiocchi, John P. A. Ioann

Importance

The benefits of responsible sharing of individual-participant data (IPD) from clinical studies are well recognized, but stakeholders often disagree on how to align those benefits with privacy risks, costs, and incentives for clinical trialists and sponsors.

The International Committee of Medical Journal Editors (ICMJE) required a data sharing statement (DSS) from submissions reporting clinical trials effective July 1, 2018. The required DSSs provide a window into current data sharing rates, practices, and norms among trialists and sponsors.

Objective

To evaluate the implementation of the ICMJE DSS requirement in 3 leading medical journals: JAMA, Lancet, and New England Journal of Medicine (NEJM).

Design, Setting, and Participants

This is a cross-sectional study of clinical trial reports published as articles in JAMA, Lancet, and NEJM between July 1, 2018, and April 4, 2020. Articles not eligible for DSS, including observational studies and letters or correspondence, were excluded.

A MEDLINE/PubMed search identified 487 eligible clinical trials in JAMA (112 trials), Lancet (147 trials), and NEJM (228 trials). Two reviewers evaluated each of the 487 articles independently.

Exposure

Publication of clinical trial reports in an ICMJE medical journal requiring a DSS.

Main Outcomes and Measures

The primary outcomes of the study were declared data availability and actual data availability in repositories. Other captured outcomes were data type, access, and conditions and reasons for data availability or unavailability. Associations with funding sources were examined.

Results

A total of 334 of 487 articles (68.6%; 95% CI, 64%-73%) declared data sharing, with nonindustry NIH-funded trials exhibiting the highest rates of declared data sharing (89%; 95% CI, 80%-98%) and industry-funded trials the lowest (61%; 95% CI, 54%-68%).

However, only 2 IPD sets (0.6%; 95% CI, 0.0%-1.5%) were actually deidentified and publicly available as of April 10, 2020. The remaining were supposedly accessible via request to authors (143 of 334 articles [42.8%]), repository (89 of 334 articles [26.6%]), and company (78 of 334 articles [23.4%]).

Among the 89 articles declaring that IPD would be stored in repositories, only 17 (19.1%) deposited data, mostly because of embargo and regulatory approval. Embargo was set in 47.3% of data-sharing articles (158 of 334), and in half of them the period exceeded 1 year or was unspecified.

Conclusions and Relevance

Most trials published in JAMA, Lancet, and NEJM after the implementation of the ICMJE policy declared their intent to make clinical data available. However, a wide gap between declared and actual data sharing exists.

To improve transparency and data reuse, journals should promote the use of unique pointers to data set location and standardized choices for embargo periods and access requirements.

URL : Evaluation of Data Sharing After Implementation of the International Committee of Medical Journal Editors Data Sharing Statement Requirement

DOI :10.1001/jamanetworkopen.2020.33972

Improving Opportunities for New Value of Open Data: Assessing and Certifying Research Data Repositories

Author : Robert R. Downs

Investments in research that produce scientific and scholarly data can be leveraged by enabling the resulting research data products and services to be used by broader communities and for new purposes, extending reuse beyond the initial users and purposes for which the data were originally collected.

Submitting research data to a data repository offers opportunities for the data to be used in the future, providing ways for new benefits to be realized from data reuse. Improvements to data repositories that facilitate new uses of data increase the potential for data reuse and for gains in the value of open data products and services that are associated with such reuse.

Assessing and certifying the capabilities and services offered by data repositories provides opportunities for improving the repositories and for realizing the value to be attained from new uses of data.

The evolution of data repository certification instruments is described and discussed in terms of the implications for the curation and continuing use of research data.

URL : Improving Opportunities for New Value of Open Data: Assessing and Certifying Research Data Repositories

DOI : http://doi.org/10.5334/dsj-2021-001