Who Shares Who Doesn’t Factors Associated with Openly…

Who Shares? Who Doesn’t? Factors Associated with Openly Archiving Raw Research Data :

“Many initiatives encourage investigators to share their raw datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp of who openly shares raw research data, who doesn’t, and which initiatives are correlated with high rates of data sharing. In this analysis I use bibliometric methods to identify patterns in the frequency with which investigators openly archive their raw gene expression microarray datasets after study publication.

Automated methods identified 11,603 articles published between 2000 and 2009 that describe the creation of gene expression microarray data. Associated datasets in best-practice repositories were found for 25% of these articles, increasing from less than 5% in 2001 to 30%–35% in 2007–2009. Accounting for sensitivity of the automated methods, approximately 45% of recent gene expression studies made their data publicly available.

First-order factor analysis on 124 diverse bibliometric attributes of the data creation articles revealed 15 factors describing authorship, funding, institution, publication, and domain environments. In multivariate regression, authors were most likely to share data if they had prior experience sharing or reusing data, if their study was published in an open access journal or a journal with a relatively strong data sharing policy, or if the study was funded by a large number of NIH grants. Authors of studies on cancer and human subjects were least likely to make their datasets available.

These results suggest research data sharing levels are still low and increasing only slowly, and data is least available in areas where it could make the biggest impact. Let’s learn from those with high rates of sharing to embrace the full potential of our research output.”

URL : http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0018657

The Dataverse Network®: An Open-Source A…

The Dataverse Network®: An Open-Source Application for Sharing, Discovering and Preserving Data :

“The Dataverse Network is an open-source application for publishing, referencing, extracting and analyzing research data. The main goal of the Dataverse Network is to solve the problems of data sharing through building technologies that enable institutions to reduce the burden for researchers and data publishers, and incentivize them to share their data. By installing Dataverse Network software, an institution is able to host multiple individual virtual archives, called “dataverses” for scholars, research groups, or journals, providing a data publication framework that supports author recognition, persistent citation, data discovery and preservation. Dataverses require no hardware or software costs, nor maintenance or backups by the data owner, but still enable all web visibility and credit to devolve to the data owner.”

URL : http://www.dlib.org/dlib/january11/crosas/01crosas.html

Quality of Research Data, an Operational Approach

This article reports on a study, commissioned by SURFfoundation, investigating the operational aspects of the concept of quality for the various phases in the life cycle of research data: production, management, and use/re-use.

Potential recommendations for quality improvement were derived from interviews and a study of the literature. These recommendations were tested via a national academic survey of three disciplinary domains as designated by the European Science Foundation: Physical Sciences and Engineering, Social Sciences and Humanities, and Life Sciences.

The “popularity” of each recommendation was determined by comparing its perceived importance against the objections to it. On this basis, it was possible to draw up generic and discipline-specific recommendations for both the dos and the don’ts.”

URL : http://www.dlib.org/dlib/january11/waaijers/01waaijers.html

Developing Infrastructure for Research D…

Developing Infrastructure for Research Data Management at the University of Oxford :

“James A. J. Wilson, Michael A. Fraser, Luis Martinez-Uribe, Paul Jeffreys, Meriel Patrick, Asif Akram and Tahir Mansoori describe the approaches taken, findings, and issues encountered while developing research data management services and infrastructure at the University of Oxford.”

URL : http://www.ariadne.ac.uk/issue65/wilson-et-al/

IISH Guidelines for preserving research …

IISH Guidelines for preserving research data: a framework for preserving collaborative data collections for future research :

“Our guidelines highlight the iterative process of data collection, data processing, data analysis and publication of (interim) research results. The iterative process is best analyzed and illustrated by following the dynamics of data collection in online collaboratories. The production of data sets in such large scale data collection projects, typically takes a lot of time, whilst in the meantime research may already be performed on data sub-sets. If this leads to a publication a proper citation is required. Publishers and readers need to know exactly in what stage of the data collection process specific conclusions on these data were drawn. During this iterative process, research data need to be maintained, managed and disseminated in different forms and versions during the successive stages of the work carried out, in order to validate the outcomes and research results. These practices drive the requirements for data archiving and show that data archiving is not a once off data transfer transaction or even a linear process. Therefore from the perspective of the research process, we recommend the interconnection and interfacing between data collection and data archiving, in order to ensure the most effective and loss-less preservation of the research data.”

URL : http://www.surffoundation.nl/nl/themas/openonderzoek/cris/Documents/SURFshare_Collectioneren_Guidelines_IISH_DEF.pdf

The Future of the Journal? Integrating research data with scientific discourse

To advance the pace of scientific discovery we propose a conceptual format that forms the basis of a truly new way of publishing science. In our proposal, all scientific communication objects (including experimental workflows, direct results, email conversations, and all drafted and published information artifacts) are labeled and stored in a great, big, distributed data store (or many distributed data stores that are all connected).

Each item has a set of metadata attached to it, which includes (at least) the person and time it was created, the type of object it is, and the status of the object including intellectual property rights and ownership. Every researcher can (and must) deposit every knowledge item that is produced in the lab into this repository.

With this deposition goes an essential metadata component that states who has the rights to see, use, distribute, buy or sell this item. Into this grand (and system-wise distributed, cloud-based) architecture, all items produced by a single lab, or several labs, are stored, labeled and connected.

URL : http://precedings.nature.com/documents/4742/version/1

Research Data: Who will share what, with…

Research Data: Who will share what, with whom, when, and why? :

“The deluge of scientific research data has excited the general public, as well as the scientific community, with the possibilities for better understanding of scientific problems, from climate to culture. For data to be available, researchers must be willing and able to share them. The policies of governments, funding agencies, journals, and university tenure and promotion committees also influence how, when, and whether research data are shared. Data are complex objects. Their purposes and the methods by which they are produced vary widely across scientific fields, as do the criteria for sharing them. To address these challenges, it is necessary to examine the arguments for sharing data and how those arguments match the motivations and interests of the scientific community and the public. Four arguments are examined: to make the results of publicly funded data available to the public, to enable others to ask new questions of extant data, to advance the state of science, and to reproduce research. Libraries need to consider their role in the face of each of these arguments, and what expertise and systems they require for data curation.”

URL : http://works.bepress.com/borgman/238/