Reproducible and Attributable Materials Science Curation Practices: A Case Study

Authors : Ye Li, Sarah Laura Wilson, Micah Altman

While small labs produce much of the fundamental experimental research in Material Science and Engineering (MSE), little is known about their data management and sharing practices and the extent to which they promote trust in, and transparency of, the published research.

In this research, we conduct a case study of a leading MSE research lab to characterize the limits of current data management and sharing practices concerning reproducibility and attribution. We systematically reconstruct the workflows, underpinning four research projects by combining interviews, document review, and digital forensics. We then apply information graph analysis and computer-assisted retrospective auditing to identify where critical research information is unavailable or at risk.

We find that while data management and sharing practices in this leading lab protect against computer and disk failure, they are insufficient to ensure reproducibility or correct attribution of work — especially when a group member withdraws before project completion.

We conclude with recommendations for adjustments to MSE data management and sharing practices to promote trustworthiness and transparency by adding lightweight automated file-level auditing and automated data transfer processes.

URL : Reproducible and Attributable Materials Science Curation Practices: A Case Study

DOI : https://doi.org/10.2218/ijdc.v18i1.940

Data Science at the Singularity

Author : David Donoho

Something fundamental to computation-based research has really changed in the last ten years. In certain fields, progress is simply dramatically more rapid than previously. Researchers in affected fields are living through a period of profound transformation, as the fields undergo a transition to frictionless reproducibility (FR).

This transition markedly changes the rate of spread of ideas and practices, affects scientific mindsets and the goals of science, and erases memories of much that came before. The emergence of FR flows from 3 data science principles that matured together after decades of work by many technologists and numerous research communities.

The mature principles involve data sharing, code sharing, and competitive challenges, however implemented in the particularly strong form of frictionless open services. Empirical Machine Learning is today’s leading adherent field; its hidden superpower is adherence to frictionless reproducibility practices; these practices are responsible for the striking and surprising progress in AI that we see everywhere; they can be learned and adhered to by researchers in whatever research field, automatically increasing the rate of progress in each adherent field.

URL : Data Science at the Singularity

DOI : https://doi.org/10.1162/99608f92.b91339ef

Emerging roles and responsibilities of libraries in support of reproducible research

Authors : Birgit Schmidt, Andrea Chiarelli, Lucia Loffreda, Jeroen Sondervan

Ensuring the reproducibility of research is a multi-stakeholder effort that comes with challenges and opportunities for individual researchers and research communities, librarians, publishers, funders and service providers. These emerge at various steps of the research process, and, in particular, at the publication stage.

Previous work by Knowledge Exchange highlighted that, while there is growing awareness among researchers, reproducible publication practices have been slow to change. Importantly, research reproducibility has not yet reached institutional agendas: this work seeks to highlight the rationale for libraries to initiate and/or step up their engagement with this topic, which we argue is well aligned with their core values and strategic priorities.

We draw on secondary analysis of data gathered by Knowledge Exchange, focusing on the literature identified as well as interviews held with librarians. We extend this through further investigation of the literature and by integrating the findings of discussions held at the 2022 LIBER conference, to provide an updated picture of how libraries engage with research reproducibility.

Libraries have a significant role in promoting responsible research practices, including transparency and reproducibility, by leveraging their connections to academic communities and collaborating with stakeholders like research funders and publishers. Our recommendations for libraries include: i) partnering with researchers to promote a research culture that values transparency and reproducibility, ii) enhancing existing research infrastructure and support; and iii) investing in raising awareness and developing skills and capacities related to these principles.

URL : Emerging roles and responsibilities of libraries in support of reproducible research

DOI : https://doi.org/10.53377/lq.14947

Reproducibility in Management Science

Authors : Miloš Fišar, Ben Greiner, Christoph Huber, Elena Katok, Ali I. Ozkes

With the help of more than 700 reviewers, we assess the reproducibility of nearly 500 articles published in the journal Management Science before and after the introduction of a new Data and Code Disclosure policy in 2019.

When considering only articles for which data accessibility and hardware and software requirements were not an obstacle for reviewers, the results of more than 95% of articles under the new disclosure policy could be fully or largely computationally reproduced. However, for 29% of articles, at least part of the data set was not accessible to the reviewer. Considering all articles in our sample reduces the share of reproduced articles to 68%.

These figures represent a significant increase compared with the period before the introduction of the disclosure policy, where only 12% of articles voluntarily provided replication materials, of which 55% could be (largely) reproduced. Substantial heterogeneity in reproducibility rates across different fields is mainly driven by differences in data set accessibility.

Other reasons for unsuccessful reproduction attempts include missing code, unresolvable code errors, weak or missing documentation, and software and hardware requirements and code complexity. Our findings highlight the importance of journal code and data disclosure policies and suggest potential avenues for enhancing their effectiveness.

DOI : https://doi.org/10.1287/mnsc.2023.03556

Reproducible research practices and transparency across linguistics

Authors : Agata Bochynska, Liam Keeble, Caitlin Halfacre, Joseph V. Casillas, Irys-Amélie Champagne, Kaidi Chen, Melanie Röthlisberger, Erin M. Buchanan, Timo B. Roettger

Scientific studies of language span across many disciplines and provide evidence for social,  cultural, cognitive, technological, and biomedical studies of human nature and behavior. As it becomes increasingly empirical and quantitative, linguistics has been facing challenges and limitations of the scientific practices that pose barriers to reproducibility and replicability.

One of the  proposed solutions to the widely acknowledged reproducibility and replicability crisis has been the implementation of transparency practices,  e.g., open access publishing, preregistrations, sharing study materials, data, and analyses, performing study replications, and declaring conflicts of interest.

Here, we have assessed the prevalence of these practices in 600 randomly sampled journal articles from linguistics across two time points. In line with similar studies in other disciplines, we found that 35% of the articles were published open access and the rates of sharing materials, data, and protocols were below 10%. None of the articles reported preregistrations, 1% reported replications, and 10% had conflict of interest statements.

These rates have not increased noticeably between 2008/2009 and 2018/2019, pointing to remaining barriers and the slow adoption of open and reproducible research practices in linguistics.

To facilitate adoption of these practices, we provide a range of recommendations and solutions for implementing transparency and improving reproducibility of research in linguistics.

URL : Reproducible research practices and transparency across linguistics

DOI : https://doi.org/10.5070/G6011239

Analytical code sharing practices in biomedical research

Authors : Nitesh Kumar Sharma, Ram Ayyala, Dhrithi Deshpande et al.

Data-driven computational analysis is becoming increasingly important in biomedical research, as the amount of data being generated continues to grow. However, the lack of practices of sharing research outputs, such as data, source code and methods, affects transparency and reproducibility of studies, which are critical to the advancement of science. Many published studies are not reproducible due to insufficient documentation, code, and data being shared.

We conducted a comprehensive analysis of 453 manuscripts published between 2016-2021 and found that 50.1% of them fail to share the analytical code. Even among those that did disclose their code, a vast majority failed to offer additional research outputs, such as data. Furthermore, only one in ten papers organized their code in a structured and reproducible manner. We discovered a significant association between the presence of code availability statements and increased code availability (p=2.71×10−9).

Additionally, a greater proportion of studies conducting secondary analyses were inclined to share their code compared to those conducting primary analyses (p=1.15*10−07). In light of our findings, we propose raising awareness of code sharing practices and taking immediate steps to enhance code availability to improve reproducibility in biomedical research.

By increasing transparency and reproducibility, we can promote scientific rigor, encourage collaboration, and accelerate scientific discoveries. We must prioritize open science practices, including sharing code, data, and other research products, to ensure that biomedical research can be replicated and built upon by others in the scientific community.

URL : Analytical code sharing practices in biomedical research

DOI : https://doi.org/10.1101/2023.07.31.551384

Caching and Reproducibility: Making Data Science Experiments Faster and FAIRer

Authors : Moritz Schubotz, Ankit Satpute, André Greiner-Petter, Akiko Aizawa, Bela Gipp

Small to medium-scale data science experiments often rely on research software developed ad-hoc by individual scientists or small teams. Often there is no time to make the research software fast, reusable, and open access.

The consequence is twofold. First, subsequent researchers must spend significant work hours building upon the proposed hypotheses or experimental framework. In the worst case, others cannot reproduce the experiment and reuse the findings for subsequent research. Second, suppose the ad-hoc research software fails during often long-running computational expensive experiments.

In that case, the overall effort to iteratively improve the software and rerun the experiments creates significant time pressure on the researchers. We suggest making caching an integral part of the research software development process, even before the first line of code is written.

This article outlines caching recommendations for developing research software in data science projects. Our recommendations provide a perspective to circumvent common problems such as propriety dependence, speed, etc. At the same time, caching contributes to the reproducibility of experiments in the open science workflow.

Concerning the four guiding principles, i.e., Findability, Accessibility, Interoperability, and Reusability (FAIR), we foresee that including the proposed recommendation in a research software development will make the data related to that software FAIRer for both machines and humans.

We exhibit the usefulness of some of the proposed recommendations on our recently completed research software project in mathematical information retrieval.

URL : Caching and Reproducibility: Making Data Science Experiments Faster and FAIRer

DOI : https://doi.org/10.3389/frma.2022.861944