Open Science in Practice: Researcher Perspectives and Participation

We report on an exploratory study consisting of brief case studies in selected disciplines, examining what motivates researchers to work (or want to work) in an open manner with regard to their data, results and protocols, and whether advantages are delivered by working in this way. We review the policy background to open science, and literature on the benefits attributed to open data, considering how these relate to curation and to questions of who participates in science.

The case studies investigate the perceived benefits to researchers, research institutions and funding bodies of utilising open scientific methods, the disincentives and barriers, and the degree to which there is evidence to support these perceptions. Six case study groups were selected in astronomy, bioinformatics, chemistry, epidemiology, language technology and neuroimaging.

The studies identify relevant examples and issues through qualitative analysis of interview transcripts. We provide a typology of degrees of open working across the research lifecycle, and conclude that better support for open working, through guidelines to assist research groups in identifying the value and costs of working more openly, and further research to assess the risks, incentives and shifts in responsibility entailed by opening up the research process are needed.

URL : http://www.ijdc.net/index.php/ijdc/article/view/173

Linking to Scientific Data Identity Problems of Unruly…

Linking to Scientific Data: Identity Problems of Unruly and Poorly Bounded Digital Objects :

“Within information systems, a significant aspect of search and retrieval across information objects, such as datasets, journal articles, or images, relies on the identity construction of the objects. This paper uses identity to refer to the qualities or characteristics of an information object that make it definable and recognizable, and can be used to distinguish it from other objects. Identity, in this context, can be seen as the foundation from which citations, metadata and identifiers are constructed.

In recent years the idea of including datasets within the scientific record has been gaining significant momentum, with publishers, granting agencies and libraries engaging with the challenge. However, the task has been fraught with questions of best practice for establishing this infrastructure, especially in regards to how citations, metadata and identifiers should be constructed. These questions suggests a problem with how dataset identities are formed, such that an engagement with the definition of datasets as conceptual objects is warranted.

This paper explores some of the ways in which scientific data is an unruly and poorly bounded object, and goes on to propose that in order for datasets to fulfill the roles expected for them, the following identity functions are essential for scholarly publications: (i) the dataset is constructed as a semantically and logically concrete object, (ii) the identity of the dataset is embedded, inherent and/or inseparable, (iii) the identity embodies a framework of authorship, rights and limitations, and (iv) the identity translates into an actionable mechanism for retrieval or reference.”

URL : http://www.ijdc.net/index.php/ijdc/article/view/174

Supporting Science through the Interoper…

Supporting Science through the Interoperability of Data and Articles :

“Whereas it is established practice to publish relevant findings of a research project in a scientific article, there are no standards yet as to whether and how to make the underlying research data publicly accessible. According to the recent PARSE.Insight study of the EU, over 84% of scientists think it is useful to link underlying digital research data to peer-reviewed literature.This trend is reinforced by funding bodies, who — to an increasing extent — require the grantees to deposit their raw datasets at freely accessible repositories. And also the publishing industry believes that raw datasets should be made freely accessible. This article presents an overview of how Elsevier as a scientific publisher with over 2,000 journals gives context to articles that are available on their full-text platform SciVerse ScienceDirect, by linking out to externally hosted data at the article level, at the entity level, and in a deeply integrated way. With this overview, Elsevier invites dataset repositories to collaborate with publishers to create an optimal interoperability between the formal scientific literature and the associated research data — improving the scientific workflow and ultimately supporting science.”

Developing Infrastructure for Research D…

Developing Infrastructure for Research Data Management at the University of Oxford :

“James A. J. Wilson, Michael A. Fraser, Luis Martinez-Uribe, Paul Jeffreys, Meriel Patrick, Asif Akram and Tahir Mansoori describe the approaches taken, findings, and issues encountered while developing research data management services and infrastructure at the University of Oxford.”

URL : http://www.ariadne.ac.uk/issue65/wilson-et-al/

Riding the wave – How Europe can gain fr…

Riding the wave – How Europe can gain from the rising tide of scientific data – Final report of the High Level Expert Group on Scientific Data :

“The report describes long term scenarios and associated challenges regarding scientific data access, curation and preservation as well as the strategy and actions necessary to realise the vision. The High-Level Group is composed of twelve top-level European experts in different fields of science and is chaired by Prof John Wood, also chair of ERAB.”

URL : http://ec.europa.eu/information_society/newsroom/cf/itemlongdetail.cfm?item_id=6204

Why Linked Data is Not Enough for Scient…

Why Linked Data is Not Enough for Scientists :

“Scientific data stands to represent a significant portion of the linked open data cloud and science itself stands to benefit from the data fusion capability that this will afford.
However, simply publishing linked data into the cloud does not necessarily meet the requirements of reuse. Publishing has requirements of provenance, quality, credit, attribution, methods in order to provide the reproducibility that allows validation of results. In this paper we make the case for a scientific data
publication model on top of linked data and introduce the notion of Research Objects as first class citizens for sharing and publishing.”

URL : http://eprints.ecs.soton.ac.uk/21587/

Data Sharing, Latency Variables and the …

Data Sharing, Latency Variables and the Science Commons :
“Over the past decade, the rapidly decreasing cost of computer storage and the increasing prevalence of high-speed Internet connections have fundamentally altered the way in which scientific research is conducted. Led by scientists in disciplines such as genomics, the rapid sharing of data sets and cross-institutional collaboration promise to increase scientific efficiency and output dramatically. As a result, an increasing number of public “commons” of scientific data are being created: aggregations intended to be used and accessed by researchers worldwide. Yet, the sharing of scientific data presents legal, ethical and practical challenges that must be overcome before such science commons can be deployed and utilized to their greatest potential. These challenges include determining the appropriate level of intellectual property protection for data within the commons, balancing the publication priority interests of data generators and data users, ensuring a viable economic model for publishers and other intermediaries and achieving the public benefits sought by funding agencies.
In this paper, I analyze scientific data sharing within the framework offered by organizational theory, expanding existing analytical approaches with a new tool termed “latency analysis.” I place latency analysis within the larger Institutional Analysis and Development (IAD) framework, as well as more recent variations of that framework. Latency analysis exploits two key variables that characterize all information commons: the rate at which information enters the commons (its knowledge latency) and the rate at which the knowledge in the commons becomes be freely utilizable (its rights latency). With these two variables in mind, one proceeds to a three-step analytical methodology that consists of (1) determining the stakeholder communities relevant to the information commons, (2) determining the policy objectives that are relevant to each stakeholder group, and (3) mediating among the differing positions of the stakeholder groups through adjustments to the latency variables of the commons.
I apply latency analysis to two well-known narratives of commons formation in the sciences: the field of genomics, which developed unique modes of rapid data sharing during the Human Genome Project and continues to influence data sharing practices in the biological sciences today; and the more generalized case of open access publishing requirements imposed on publishers by the U.S. National Institutes of Health and various research universities. In each of these cases, policy designers have used timing mechanisms to achieve policy outcomes. That is, by regulating the speed at which information is released into a commons, and then by imposing time-based restrictions on its use, policy designers have addressed the concerns of multiple stakeholders and established information commons that operate effectively and equitably. I conclude that the use of latency variables in commons policy design can, in general, reduce negotiation transaction costs, achieve efficient and equitable results for all stakeholders, and thereby facilitate the formation of socially-valuable commons of scientific information.”
URL : http://works.bepress.com/jorge_contreras/3