Implementing the FAIR Data Principles in precision oncology: review of supporting initiatives

Authors : Charles Vesteghem, Rasmus Froberg Brøndum, Mads Sønderkær, Mia Sommer, Alexander Schmitz, Julie Støve Bødker, Karen Dybkær, Tarec Christoffer El-Galaly, Martin Bøgsted

Compelling research has recently shown that cancer is so heterogeneous that single research centres cannot produce enough data to fit prognostic and predictive models of sufficient accuracy. Data sharing in precision oncology is therefore of utmost importance.

The Findable, Accessible, Interoperable and Reusable (FAIR) Data Principles have been developed to define good practices in data sharing. Motivated by the ambition of applying the FAIR Data Principles to our own clinical precision oncology implementations and research, we have performed a systematic literature review of potentially relevant initiatives.

For clinical data, we suggest using the Genomic Data Commons model as a reference as it provides a field-tested and well-documented solution. Regarding classification of diagnosis, morphology and topography and drugs, we chose to follow the World Health Organization standards, i.e. ICD10, ICD-O-3 and Anatomical Therapeutic Chemical classifications, respectively.

For the bioinformatics pipeline, the Genome Analysis ToolKit Best Practices using Docker containers offer a coherent solution and have therefore been selected. Regarding the naming of variants, we follow the Human Genome Variation Society’s standard.

For the IT infrastructure, we have built a centralized solution to participate in data sharing through federated solutions such as the Beacon Networks.

URL : Implementing the FAIR Data Principles in precision oncology: review of supporting initiatives

DOI : https://doi.org/10.1093/bib/bbz044

The citation advantage of linking publications to research data

Authors : Giovanni Colavizza, Iain Hrynaszkiewicz, Isla Staden, Kirstie Whitaker, Barbara McGillivray

Efforts to make research results open and reproducible are increasingly reflected by journal policies encouraging or mandating authors to provide data availability statements.

As a consequence of this, there has been a strong uptake of data availability statements in recent literature. Nevertheless, it is still unclear what proportion of these statements actually contain well-formed links to data, for example via a URL or permanent identifier, and if there is an added value in providing them.

We consider 531,889 journal articles published by PLOS and BMC which are part of the PubMed Open Access collection, categorize their data availability statements according to their content and analyze the citation advantage of different statement categories via regression.

We find that, following mandated publisher policies, data availability statements have become common by now, yet statements containing a link to a repository are still just a fraction of the total.

We also find that articles with these statements, in particular, can have up to 25.36% higher citation impact on average: an encouraging result for all publishers and authors who make the effort of sharing their data. All our data and code are made available in order to reproduce and extend our results.

URL : https://arxiv.org/abs/1907.02565

A model for initiating research data management services at academic libraries

Authors : Kevin B. Read, Jessica Koos, Rebekah S. Miller, Cathryn F. Miller, Gesina A. Phillips, Laurel Scheinfeld, Alisa Surkis

Background

Librarians developed a pilot program to provide training, resources, strategies, and support for medical libraries seeking to establish research data management (RDM) services. Participants were required to complete eight educational modules to provide the necessary background in RDM.

Each participating institution was then required to use two of the following three elements: (1) a template and strategies for data interviews, (2) a teaching tool kit to teach an introductory RDM class, or (3) strategies for hosting a data class series.

Case Presentation

Six libraries participated in the pilot, with between two and eight librarians participating from each institution. Librarians from each institution completed the online training modules.

Each institution conducted between six and fifteen data interviews, which helped build connections with researchers, and taught between one and five introductory RDM classes.

All classes received very positive evaluations from attendees. Two libraries conducted a data series, with one bringing in instructors from outside the library.

Conclusion

The pilot program proved successful in helping participating librarians learn about and engage with their research communities, jump-start their teaching of RDM, and develop institutional partnerships around RDM services.

The practical, hands-on approach of this pilot proved to be successful in helping libraries with different environments establish RDM services.

The success of this pilot provides a proven path forward for libraries that are developing data services at their own institutions.

URL : A model for initiating research data management services at academic libraries

Alternative location : http://jmla.pitt.edu/ojs/jmla/article/view/545

Developing a research data policy framework for all journals and publishers

Authors : Iain Hrynaszkiewicz​, Natasha Simons​, Azhar Hussain​,​ Simon Goudie

More journals and publishers – and funding agencies and institutions – are introducing research data policies. But as the prevalence of policies increases, there is potential to confuse researchers and support staff with numerous or conflicting policy requirements.

We define and describe 14 features of journal research data policies and arrange these into a set of six standard policy types or tiers, which can be adopted by journals and publishers to promote data sharing in a way that encourages good practice and is appropriate for their audience’s perceived needs.

Policy features include coverage of topics such as data citation, data repositories, data availability statements, data standards and formats, and peer review of research data.

These policy features and types have been created by reviewing the policies of multiple scholarly publishers, which collectively publish more than 10,000 journals, and through discussions and consensus building with multiple stakeholders in research data policy via the Data Policy Standardisation and Implementation Interest Group of the Research Data Alliance.

Implementation guidelines for the standard research data policies for journals and publishers are also provided, along with template policy texts which can be implemented by journals in their Information for Authors and publishing workflows.

We conclude with a call for collaboration across the scholarly publishing and wider research community to drive further implementation and adoption of consistent research data policies.

URL : Developing a research data policy framework for all journals and publishers

Alternative location : https://figshare.com/articles/Developing_a_research_data_policy_framework_for_all_journals_and_publishers/8223365/1

Is Open Access to Research Data a Strategic Priority of Czech Universities?

Author : Jakub Novotný

Open access to research data is one of the key themes of current science development concepts and relevant R & D strategies at least in Europe. A systemic change in the modus operandi of science and research should lead to so-called Open Science.

The presented paper questions the extent to which the Open Science concept is reflected in the strategies of Czech universities. The paper first describes basic idea of Open Access to Research Data including principles of „FAIR data” as one of the key assumption of it.

After a brief characterization of the Czech university sector, the results of the empirical analysis of the inclusion of the Open Access to Research Data concept in the current strategic plans of the Czech universities are presented.

The conclusion of the paper is then an evaluation of the results, which reveal an underestimation of the Open Science concept in the current strategic plans of the Czech universities.

URL : Is Open Access to Research Data a Strategic Priority of Czech Universities?

DOI : https://doi.org/10.2478/ijicte-2018-0008

A Generic Research Data Infrastructure for Long Tail Research Data Management

Authors : Atif Latif, Fidan Limani, Klaus Tochtermann

The advent of data intensive science has fueled the generation of digital scientific data. Undoubtedly, digital research data plays a pivotal role in transparency and re-producibility of scientific results as well as in steering the innovation in a research process.

However, the main challenges for science policy and infrastructure projects are to develop practices and solutions for research data management which in compliance with good scientific standards make the research data discoverable, citeble and accessible for society potential reuse.

GeRDI – the Generic Research Data (RD) Infrastructure – is such a research data management initiative which targets long tail content that stems from research communities belonging to different domain and research practices.

It provides a generic and open software which connects research data infrastructures of communities to enable the investigation of multidisciplinary research questions.

URL : A Generic Research Data Infrastructure for Long Tail Research Data Management

DOI : http://doi.org/10.5334/dsj-2019-017

Are Research Datasets FAIR in the Long Run?

Authors : Dennis Wehrle, Klaus Rechert

Currently, initiatives in Germany are developing infrastructure to accept and preserve dissertation data together with the dissertation texts (on state level – bwDATA Diss, on federal level – eDissPlus).

In contrast to specialized data repositories, these services will accept data from all kind of research disciplines. To ensure FAIR data principles (Wilkinson et al., 2016), preservation plans are required, because ensuring accessibility, interoperability and re-usability even for a minimum ten year data redemption period can become a major challenge.

Both for longevity and re-usability, file formats matter. In order to ensure access to data, the data’s encoding, i.e. their technical and structural representation in form of file formats, needs to be understood. Hence, due to a fast technical lifecycle, interoperability, re-use and in some cases even accessibility depends on the data’s format and our future ability to parse or render these.

This leads to several practical questions regarding quality assurance, potential access options and necessary future preservation steps. In this paper, we analyze datasets from public repositories and apply a file format based long-term preservation risk model to support workflows and services for non-domain specific data repositories.

URL : Are Research Datasets FAIR in the Long Run?

DOI : https://doi.org/10.2218/ijdc.v13i1.659