Repository Approaches to Improving the Quality of Shared Data and Code

Authors : Ana Trisovic, Katherine Mika, Ceilyn Boyd, Sebastian Feger, Mercè Crosas

Sharing data and code for reuse has become increasingly important in scientific work over the past decade. However, in practice, shared data and code may be unusable, or published results obtained from them may be irreproducible.

Data repository features and services contribute significantly to the quality, longevity, and reusability of datasets.

This paper presents a combination of original and secondary data analysis studies focusing on computational reproducibility, data curation, and gamified design elements that can be employed to indicate and improve the quality of shared data and code.

The findings of these studies are sorted into three approaches that can be valuable to data repositories, archives, and other research dissemination platforms.

URL : Repository Approaches to Improving the Quality of Shared Data and Code

DOI : https://doi.org/10.3390/data6020015

Publishers’ Responsibilities in Promoting Data Quality and Reproducibility

Author : Iain Hrynaszkiewicz

Scholarly publishers can help to increase data quality and reproducible research by promoting transparency and openness.

Increasing transparency can be achieved by publishers in six key areas: (1) understanding researchers’ problems and motivations, by conducting and responding to the findings of surveys; (2) raising awareness of issues and encouraging behavioural and cultural change, by introducing consistent journal policies on sharing research data, code and materials; (3) improving the quality and objectivity of the peer-review process by implementing reporting guidelines and checklists and using technology to identify misconduct; (4) improving scholarly communication infrastructure with journals that publish all scientifically sound research, promoting study registration, partnering with data repositories and providing services that improve data sharing and data curation; (5) increasing incentives for practising open research with data journals and software journals and implementing data citation and badges for transparency; and (6) making research communication more open and accessible, with open-access publishing options, permitting text and data mining and sharing publisher data and metadata and through industry and community collaboration.

This chapter describes practical approaches being taken by publishers, in these six areas, their progress and effectiveness and the implications for researchers publishing their work.

URL : Publishers’ Responsibilities in Promoting Data Quality and Reproducibility

Alternative location : https://link.springer.com/chapter/10.1007%2F164_2019_290

Recommended versus Certified Repositories: Mind the Gap

Authors : Sean Edward Husen, Zoë G. de Wilde, Anita de Waard, Helena Cousijn

Researchers are increasingly required to make research data publicly available in data repositories. Although several organisations propose criteria to recommend and evaluate the quality of data repositories, there is no consensus of what constitutes a good data repository.

In this paper, we investigate, first, which data repositories are recommended by various stakeholders (publishers, funders, and community organizations) and second, which repositories are certified by a number of organisations.

We then compare these two lists of repositories, and the criteria for recommendation and certification. We find that criteria used by organisations recommending and certifying repositories are similar, although the certification criteria are generally more detailed.

We distil the lists of criteria into seven main categories: “Mission”, “Community/Recognition”, “Legal and Contractual Compliance”, “Access/Accessibility”, “Technical Structure/Interface”, “Retrievability” and “Preservation”.

Although the criteria are similar, the lists of repositories that are recommended by the various agencies are very different. Out of all of the recommended repositories, less than 6% obtained certification.

As certification is becoming more important, steps should be taken to decrease this gap between recommended and certified repositories, and ensure that certification standards become applicable, and applied, to the repositories which researchers are currently using.

URL : Recommended versus Certified Repositories: Mind the Gap

DOI: https://doi.org/10.5334/dsj-2017-042

What do data curators care about? Data quality, user trust, and the data reuse plan

Author : Frank Andreas Sposito

Data curation is often defined as the practice of maintaining, preserving, and enhancing research data for long-term value and reusability. The role of data reuse in the data curation lifecycle is critical: increased reuse is the core justification for the often sizable expenditures necessary to build data management infrastructures and user services.

Yet recent studies have shown that data are being shared and reused through open data repositories at much lower levels than expected. These studies underscore a fundamental and often overlooked challenge in research data management that invites deeper examination of the roles and responsibilities of data curators.

This presentation will identify key barriers to data reuse, data quality and user trust, and propose a framework for implementing reuser-centric strategies to increase data reuse.

Using the concept of a “data reuse plan” it will highlight repository-based approaches to improve data quality and user trust, and address critical areas for innovation for data curators working in the absence of repository support.

URL : What do data curators care about? Data quality, user trust, and the data reuse plan

Alternative location : http://library.ifla.org/id/eprint/1797

 

Towards certified open data in digital service ecosystems

Authors : Anne Immonen, Eila Ovaska, Tuomas Paaso

The opportunities of open data have been recently recognized among companies in different domains. Digital service providers have increasingly been interested in the possibilities of innovating new ideas and services around open data.

Digital service ecosystems provide several advantages for service developers, enabling the service co-innovation and co-creation among ecosystem members utilizing and sharing common assets and knowledge.

The utilization of open data in digital services requires new innovation practices, service development models, and a collaboration environment. These can be provided by the ecosystem. However, since open data can be almost anything and originate from different kinds of data sources, the quality of data becomes the key issue.

The new challenge for service providers is how to guarantee the quality of open data. In the ecosystems, uncertain data quality poses major challenges. The main contribution of this paper is the concept of the Evolvable Open Data based digital service Ecosystem (EODE), which defines the kinds of knowledge and services that are required for validating open data in digital service ecosystems.

Thus, the EODE provides business potential for open data and digital service providers, as well as other actors around open data. The ecosystem capability model, knowledge management models, and the taxonomy of services to support the open data quality certification are described.

Data quality certification confirms that the open data is trustworthy and its quality is good enough to be accepted for the usage of the ecosystem’s services. The five-phase open data quality certification process, according to which open data is brought to the ecosystem and certified for the usage of the digital service ecosystem members using the knowledge models and support services of the ecosystem, is also described.

The initial experiences of the still ongoing validation steps are summarized, and the concept limitations and future development targets are identified.

URL : Towards certified open data in digital service ecosystems

DOI : 10.1007/s11219-017-9378-2

Global Data Quality Assessment and the Situated Nature of “Best” Research Practices in Biology

Author : Sabina Leonelli

This paper reflects on the relation between international debates around data quality assessment and the diversity characterising research practices, goals and environments within the life sciences.

Since the emergence of molecular approaches, many biologists have focused their research, and related methods and instruments for data production, on the study of genes and genomes.

While this trend is now shifting, prominent institutions and companies with stakes in molecular biology continue to set standards for what counts as ‘good science’ worldwide, resulting in the use of specific data production technologies as proxy for assessing data quality.

This is problematic considering (1) the variability in research cultures, goals and the very characteristics of biological systems, which can give rise to countless different approaches to knowledge production; and (2) the existence of research environments that produce high-quality, significant datasets despite not availing themselves of the latest technologies.

Ethnographic research carried out in such environments evidences a widespread fear among researchers that providing extensive information about their experimental set-up will affect the perceived quality of their data, making their findings vulnerable to criticisms by better-resourced peers. T

hese fears can make scientists resistant to sharing data or describing their provenance. To counter this, debates around Open Data need to include critical reflection on how data quality is evaluated, and the extent to which that evaluation requires a localised assessment of the needs, means and goals of each research environment.

URL : Global Data Quality Assessment and the Situated Nature of “Best” Research Practices in Biology

DOI : http://doi.org/10.5334/dsj-2017-032

Academic Libraries as Data Quality Hubs

Academic libraries have a critical role to play as data quality hubs on campus. There is an increased need to ensure data quality within ‘e-science’. Given academic libraries’ curation and preservation expertise, libraries are well suited to support the data quality process.

Data quality measurements are discussed, including the fundamental elements of trust, authenticity, understandability, usability and integrity, and are applied to the Digital Curation Lifecycle model to demonstrate how these measures can be used to understand and evaluate data quality within the curatorial process. Opportunities for improvement and challenges are identified as areas that are fruitful for future research and exploration.

URL : http://jlsc-pub.org/jlsc/vol1/iss3/5/