OpenTrials: towards a collaborative open database of all available information on all clinical trials

OpenTrials is a collaborative and open database for all available structured data and documents on all clinical trials, threaded together by individual trial.

With a versatile and expandable data schema, it is initially designed to host and match the following documents and data for each trial: registry entries; links, abstracts, or texts of academic journal papers; portions of regulatory documents describing individual trials; structured data on methods and results extracted by systematic reviewers or other researchers; clinical study reports; and additional documents such as blank consent forms, blank case report forms, and protocols.

The intention is to create an open, freely re-usable index of all such information and to increase discoverability, facilitate research, identify inconsistent data, enable audits on the availability and completeness of this information, support advocacy for better data and drive up standards around open data in evidence-based medicine.

The project has phase I funding. This will allow us to create a practical data schema and populate the database initially through web-scraping, basic record linkage techniques, crowd-sourced curation around selected drug areas, and import of existing sources of structured and documents.

It will also allow us to create user-friendly web interfaces onto the data and conduct user engagement workshops to optimise the database and interface designs.

Where other projects have set out to manually and perfectly curate a narrow range of information on a smaller number of trials, we aim to use a broader range of techniques and attempt to match a very large quantity of information on all trials. We are currently seeking feedback and additional sources of structured data.

URL : OpenTrials: towards a collaborative open database of all available information on all clinical trials

Alternative location : http://trialsjournal.biomedcentral.com/articles/10.1186/s13063-016-1290-8

Achieving human and machine accessibility of cited data in scholarly publications

Reproducibility and reusability of research results is an important concern in scientific communication and science policy. A foundational element of reproducibility and reusability is the open and persistently available presentation of research data.

However, many common approaches for primary data publication in use today do not achieve sufficient long-term robustness, openness, accessibility or uniformity. Nor do they permit comprehensive exploitation by modern Web technologies.

This has led to several authoritative studies recommending uniform direct citation of data archived in persistent repositories. Data are to be considered as first-class scholarly objects, and treated similarly in many ways to cited and archived scientific and scholarly literature.

Here we briefly review the most current and widely agreed set of principle-based recommendations for scholarly data citation, the Joint Declaration of Data Citation Principles (JDDCP).

We then present a framework for operationalizing the JDDCP; and a set of initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.

The main target audience for the common implementation guidelines in this article consists of publishers, scholarly organizations, and persistent data repositories, including technical staff members in these organizations.

But ordinary researchers can also benefit from these recommendations. The guidance provided here is intended to help achieve widespread, uniform human and machine accessibility of deposited data, in support of significantly improved verification, validation, reproducibility and re-use of scholarly/scientific data.

URL : Achieving human and machine accessibility of cited data in scholarly publications

DOI : https://doi.org/10.7717/peerj-cs.1

Les enjeux de la patrimonialisation et de la réutilisation des données qualitatives de la recherche en Sciences humaines et sociales

Les archives de la recherche sont par nature passionnantes puisqu’elles permettent de comprendre comment les découvertes se font et comment la science évolue de jour en jour. L’arrivée du numérique a fait surgir de nouvelles possibilités pour la diffusion notamment de ces données, mais aussi de nouveaux challenges, en termes d’archivage entre autres.

L’archivage, le partage et la réutilisation des données qualitatives des SHS soulèvent de nombreuses questions et les différents acteurs concernés, les professionnels de l’IST et les chercheurs, peuvent avoir des avis divergents. Comprendre les points de vue de chacun et déterminer dans quelle mesure celles-ci peuvent être compatibles sont les enjeux de ce mémoire.

URL : Les enjeux de la patrimonialisation et de la réutilisation des données qualitatives de la recherche en Sciences humaines et sociales

Alternative location : http://www.enssib.fr/bibliotheque-numerique/notices/66007-les-enjeux-de-la-patrimonialisation-et-de-la-reutilisation-des-donnees-qualitatives-de-la-recherche-en-sciences-humaines-et-sociales

DataCite au service des données scientifiques : Identifier pour valoriser

Les données de la recherche, sous forme d’objets numériques très divers, sont en train de trouver leur place dans les services d’information scientifique et technique (IST), principalement – mais pas uniquement – comme compléments des publications qui s’appuient sur ces données.

L’intégration de différents types de ressources numériques avance, et doit être accompagnée par des standards d’interopérabilité, des formats communs de métadonnées et des possibilités de lier ces contenus entre eux et de les citer de manière normalisée.

Le consortium international DataCite, dans lequel l’Inist-CNRS représente la France, s’est mis comme objectif de soutenir et accélérer cette évolution. Il opère en particulier comme une agence d’enregistrement de DOI (Digital Object Identifier), considérant ces DOI, déjà bien établis dans le monde de l’édition, comme un outil efficace pour identifier les données de manière pérenne, pour ainsi faciliter leur découverte et pour y accéder, et puis pour les citer.

DataCite a développé son propre schéma de métadonnées et a mis en place des fonctionnalités spécifiques qui favorisent le partage et la réutilisation des données. Une telle valorisation s’inscrit en particulier dans une approche de pleinement bénéficier du potentiel des open data.

Elle est aussi une contribution essentielle à une meilleure reconnaissance du travail scientifique de production, gestion et mise à disposition de données, et notamment sa prise en compte dans les critères d’évaluation.

Il est d’ailleurs encourageant de voir que ces critères se s’ouvrent à des métriques alternatives, y compris celles concernant les données. Le sujet particulier de la citation des données a récemment été l’objet de plusieurs initiatives internationales visant à harmoniser les pratiques et émettre des recommandations.

Elles ont convergé, à travers le Data Citation Synthesis Group, vers quelques principes en train d’être largement reconnus et acceptés. Dans ce contexte, les éditeurs doivent s’adapter et clairement définir leurs politiques en termes de liens entre données et publications. On observe d’ailleurs une tendance forte vers des accords entre éditeurs et réservoirs de données.

Les actions et services de DataCite s’intègrent dans d’autres structures et initiatives internationales mises en place autour des données de la recherche et des identifiants pérennes: Research Data Alliance, WDS-ICSU, CODATA, EPIC, Data Citation Index, etc.

Un exemple particulier présente le projet européen ODIN, où DataCite et l’initiative ORCID pour la création d’identifiants d’auteurs tentent de connecter les différents types d’identifiants.

URL  : http://eprints.rclis.org/28188/

Data publication with the structural biology data grid supports live analysis

Access to experimental X-ray diffraction image data is fundamental for validation and reproduction of macromolecular models and indispensable for development of structural biology processing methods. Here, we established a diffraction data publication and dissemination system, Structural Biology Data Grid (SBDG; data.sbgrid.org), to preserve primary experimental data sets that support scientific publications.

Data sets are accessible to researchers through a community driven data grid, which facilitates global data access. Our analysis of a pilot collection of crystallographic data sets demonstrates that the information archived by SBDG is sufficient to reprocess data to statistics that meet or exceed the quality of the original published structures.

SBDG has extended its services to the entire community and is used to develop support for other types of biomedical data sets. It is anticipated that access to the experimental data sets will enhance the paradigm shift in the community towards a much more dynamic body of continuously improving data analysis.

URL : Data publication with the structural biology data grid supports live analysis

DOI : 10.1038/ncomms10882

Wikidata as a semantic framework for the Gene Wiki initiative

Open biological data are distributed over many resources making them challenging to integrate, to update and to disseminate quickly. Wikidata is a growing, open community database which can serve this purpose and also provides tight integration with Wikipedia.

In order to improve the state of biological data, facilitate data management and dissemination, we imported all human and mouse genes, and all human and mouse proteins into Wikidata.

In total, 59 721 human genes and 73 355 mouse genes have been imported from NCBI and 27 306 human proteins and 16 728 mouse proteins have been imported from the Swissprot subset of UniProt. As Wikidata is open and can be edited by anybody, our corpus of imported data serves as the starting point for integration of further data by scientists, the Wikidata community and citizen scientists alike.

The first use case for these data is to populate Wikipedia Gene Wiki infoboxes directly from Wikidata with the data integrated above. This enables immediate updates of the Gene Wiki infoboxes as soon as the data in Wikidata are modified.

Although Gene Wiki pages are currently only on the English language version of Wikipedia, the multilingual nature of Wikidata allows for usage of the data we imported in all 280 different language Wikipedias.

Apart from the Gene Wiki infobox use case, a SPARQL endpoint and exporting functionality to several standard formats (e.g. JSON, XML) enable use of the data by scientists.

In summary, we created a fully open and extensible data resource for human and mouse molecular biology and biochemistry data. This resource enriches all the Wikipedias with structured information and serves as a new linking hub for the biological semantic web.

URL : Wikidata as a semantic framework for the Gene Wiki initiative

DOI : 10.1093/database/baw015

The FAIR Guiding Principles for scientific data management and stewardship

There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles.

The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals.

This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.

URL : The FAIR Guiding Principles for scientific data management and stewardship

Alternative location : http://www.nature.com/articles/sdata201618