Author : Costantino Thanos
High-throughput scientific instruments are generating massive amounts of data. Today, one of the main challenges faced by researchers is to make the best use of the world’s growing wealth of data. Data (re)usability is becoming a distinct characteristic of modern scientific practice.
By data (re)usability, we mean the ease of using data for legitimate scientific research by one or more communities of research (consumer communities) that is produced by other communities of research (producer communities).
Data (re)usability allows the reanalysis of evidence, reproduction and verification of results, minimizing duplication of effort, and building on the work of others. It has four main dimensions: policy, legal, economic and technological. The paper addresses the technological dimension of data reusability.
The conceptual foundations of data reuse as well as the barriers that hamper data reuse are presented and discussed. The data publication process is proposed as a bridge between the data author and user and the relevant technologies enabling this process are presented.
URL : Research Data Reusability: Conceptual Foundations, Barriers and Enabling Technologies
DOI : http://dx.doi.org/10.3390/publications5010002
Access to experimental X-ray diffraction image data is fundamental for validation and reproduction of macromolecular models and indispensable for development of structural biology processing methods. Here, we established a diffraction data publication and dissemination system, Structural Biology Data Grid (SBDG; data.sbgrid.org), to preserve primary experimental data sets that support scientific publications.
Data sets are accessible to researchers through a community driven data grid, which facilitates global data access. Our analysis of a pilot collection of crystallographic data sets demonstrates that the information archived by SBDG is sufficient to reprocess data to statistics that meet or exceed the quality of the original published structures.
SBDG has extended its services to the entire community and is used to develop support for other types of biomedical data sets. It is anticipated that access to the experimental data sets will enhance the paradigm shift in the community towards a much more dynamic body of continuously improving data analysis.
URL : Data publication with the structural biology data grid supports live analysis
DOI : 10.1038/ncomms10882
« Making available and archiving scientific results is for the most part still considered the task of classical publishing companies, despite the fact that classical forms of publishing centered around printed narrative articles no longer seem well-suited in the digital age. In particular, there exist currently no efficient, reliable, and agreed-upon methods for publishing scientific datasets, which have become increasingly important for science. Here we propose to design scientific data publishing as a Web-based bottom-up process, without top-down control of central authorities such as publishing companies. We present a protocol and a server network to decentrally store and archive data in the form of nanopublications, an RDF-based format to represent scientific data with formal semantics. We show how this approach allows researchers to produce, publish, retrieve, address, verify, and recombine datasets and their individual nanopublications in a reliable and trustworthy manner, and we argue that this architecture could be used for the Semantic Web in general. Our evaluation of the current small network shows that this system is efficient and reliable, and we discuss how it could grow to handle the large amounts of structured data that modern science is producing and consuming. »
URL : http://arxiv-web3.library.cornell.edu/abs/1411.2749
« This article discusses the drivers behind the formation of the Research Data Alliance (RDA), its current state, the lessons learned from its first full year of operation, and its anticipated impact on data publishing and sharing. One of the pressing challenges in data infrastructure (taken here to include issues relating to hardware, software and content format, as well as human actors) is how best to enable data interoperability across boundaries. This is particularly critical as the world deals with bigger and more complex problems that require data and insights from a range of disciplines. The RDA has been set up to enable more data to be shared across barriers to address these challenges. It does this through focused Working Groups and Interest Groups, formed of experts from around the world, and drawing from the academic, industry, and government sectors. »
URL : http://dx.doi.org/10.1087/20140503
Peer review of publications is at the core of science and primarily seen as instrument for ensuring research quality. However, it is less common to independently value the quality of the underlying data as well.
In the light of the ‘data deluge’ it makes sense to extend peer review to the data itself and this way evaluate the degree to which the data are fit for re-use. This paper describes a pilot study at EASY – the electronic archive for (open) research data at our institution.
In EASY, researchers can archive their data and add metadata themselves. Devoted to open access and data sharing, at the archive we are interested in further enriching these metadata with peer reviews.
As a pilot, we established a workflow where researchers who have downloaded data sets from the archive were asked to review the downloaded data set. This paper describes the details of the pilot including the findings, both quantitative and qualitative.
Finally, we discuss issues that need to be solved when such a pilot is turned into a structural peer review functionality for the archiving system.
URL : http://www.ijdc.net/index.php/ijdc/article/view/231
Citation and Peer Review of Data: Moving Towards Formal Data Publication
« This paper discusses many of the issues associated with formally publishing data in academia, focusing primarily on the structures that need to be put in place for peer review and formal citation of datasets. Data publication is becoming increasingly important to the scientific community, as it will provide a mechanism for those who create data to receive academic credit for their work and will allow the conclusions arising from an analysis to be more readily verifiable, thus promoting transparency in the scientific process. Peer review of data will also provide a mechanism for ensuring the quality of datasets, and we provide suggestions on the types of activities one expects to see in the peer review of data. A simple taxonomy of data publication methodologies is presented and evaluated, and the paper concludes with a discussion of dataset granularity, transience and semantics, along with a recommended human-readable citation syntax. »
URL : http://www.ijdc.net/index.php/ijdc/article/view/181
DataStaR: A Data Sharing and Publication Infrastructure to Support Research :
« DataStaR, a Data Staging Repository (http://datastar.mannlib.cornell.edu/) in development at Cornell University’s Albert R. Mann Library (Ithaca, New York USA), is intended to support collaboration and data sharing among researchers during the research process, and to promote publishing or archiving data and high-quality metadata to discipline-specific data centers and/or institutional repositories. Researchers may store and share data with selected colleagues, select a repository for data publication, create high quality metadata in the formats required by external repositories and Cornell’s institutional repository, and obtain help from data librarians with any of these tasks. To facilitate cross-domain interoperability and flexibility in metadata management, we employ semantic web technologies as part of DataStaR’s metadata infrastructure. This paper describes the overall design of the system, the work to date with Cornell researchers and their data sets, and possibilities for extending DataStaR for use in international agriculture research..
URL : http://journals.sfu.ca/iaald/index.php/aginfo/article/view/199