If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology

Statut

Research on practices to share and reuse data will inform the design of infrastructure to support data collection, management, and discovery in the long tail of science and technology. These are research domains in which data tend to be local in character, minimally structured, and minimally documented. We report on a ten-year study of the Center for Embedded Network Sensing (CENS), a National Science Foundation Science and Technology Center.

We found that CENS researchers are willing to share their data, but few are asked to do so, and in only a few domain areas do their funders or journals require them to deposit data. Few repositories exist to accept data in CENS research areas.. Data sharing tends to occur only through interpersonal exchanges. CENS researchers obtain data from repositories, and occasionally from registries and individuals, to provide context, calibration, or other forms of background for their studies. Neither CENS researchers nor those who request access to CENS data appear to use external data for primary research questions or for replication of studies.

CENS researchers are willing to share data if they receive credit and retain first rights to publish their results. Practices of releasing, sharing, and reusing of data in CENS reaffirm the gift culture of scholarship, in which goods are bartered between trusted colleagues rather than treated as commodities.

URL : If We Share Data, Will Anyone Use Them?

DOI : 10.1371/journal.pone.0067332

Data Sharing by Scientists: Practices and Perceptions

Background

Scientific research in the 21st century is more data intensive and collaborative than in the past. It is important to study the data practices of researchers – data accessibility, discovery, re-use, preservation and, particularly, data sharing. Data sharing is a valuable part of the scientific method allowing for verification of results and extending research from prior results.

Methodology/Principal Findings

A total of 1329 scientists participated in this survey exploring current data sharing practices and perceptions of the barriers and enablers of data sharing. Scientists do not make their data electronically available to others for various reasons, including insufficient time and lack of funding. Most respondents are satisfied with their current processes for the initial and short-term parts of the data or research lifecycle (collecting their research data; searching for, describing or cataloging, analyzing, and short-term storage of their data) but are not satisfied with long-term data preservation.

Many organizations do not provide support to their researchers for data management both in the short- and long-term. If certain conditions are met (such as formal citation and sharing reprints) respondents agree they are willing to share their data. There are also significant differences and approaches in data management practices based on primary funding agency, subject discipline, age, work focus, and world region.

Conclusions/Significance

Barriers to effective data sharing and preservation are deeply rooted in the practices and culture of the research process as well as the researchers themselves. New mandates for data management plans from NSF and other federal agencies and world-wide attention to the need to share and preserve data could lead to changes. Large scale programs, such as the NSF-sponsored DataNET (including projects like DataONE) will both bring attention and resources to the issue and make it easier for scientists to apply sound data management principles.

URL : http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0021101

Common Errors in Ecological Data Sharing Objectives…

Statut

Common Errors in Ecological Data Sharing :

Objectives: (1) to identify common errors in data organization and metadata completeness that would preclude a “reader” from being able to interpret and re-use the data for a new purpose; and (2) to develop a set of best practices derived from these common errors that would guide researchers in creating more usable data products that could be readily shared, interpreted, and used.
Methods: We used directed qualitative content analysis to assess and categorize data and metadata errors identified by peer reviewers of data papers published in the Ecological Society of America’s (ESA) Ecological Archives. Descriptive statistics provided the relative frequency of the errors identified during the peer review process.
Results: There were seven overarching error categories: Collection & Organization, Assure, Description, Preserve, Discover, Integrate, and Analyze/Visualize. These categories represent errors researchers regularly make at each stage of the Data Life Cycle. Collection & Organization and Description errors were some of the most common errors, both of which occurred in over 90% of the papers.
Conclusions: Publishing data for sharing and reuse is error prone, and each stage of the Data Life Cycle presents opportunities for mistakes. The most common errors occurred when the researcher did not provide adequate metadata to enable others to interpret and potentially re-use the data. Fortunately, there are ways to minimize these mistakes through carefully recording all details about study context, data collection, QA/ QC, and analytical procedures from the beginning of a research project and then including this descriptive information in the metadata.”

URL : http://escholarship.umassmed.edu/jeslib/vol2/iss2/1/

What should be the data sharing policy of cognitive science?

There is a growing chorus of voices in the scientific community calling for greater openness in the sharing of raw data that leads to a publication. In this commentary, we discuss the merits of sharing, common concerns that are raised, and practical issues that arise in developing a sharing policy. We suggest that the cognitive science community discuss the topic and establish a data sharing policy.

URL : http://lpl.psy.ohio-state.edu/documents/PT.pdf

Science 3.0: Corrections to the “Science 2.0” paradigm

The concept of “Science 2.0” was introduced almost a decade ago to describe the new generation of online-based tools for researchers allowing easier data sharing, collaboration and publishing.

Although technically sound, the concept still does not work as expected. Here we provide a systematic line of arguments to modify the concept of Science 2.0, making it more consistent with the spirit and traditions of science and Internet.

Our first correction to the Science 2.0 paradigm concerns the open-access publication models charging fees to the authors. As discussed elsewhere, we reiterate that the monopoly of such publishing models increases biases and inequalities in the representation of scientific ideas based on the author’s income.

Our second correction concerns post-publication comments online, which are all essentially non-anonymous in the current Science 2.0 paradigm.

We conclude that scientific post-publication discussions require special anonymization systems.

We further analyze the reasons of the failure of the current post-publication peer-review models and suggest what needs to be changed in “Science 3.0” to convert Internet into a large “journal club”.”

URL : http://arxiv.org/abs/1301.2522

The case for open computer programs Scientific…

The case for open computer programs :

“Scientific communication relies on evidence that cannot be entirely included in publications, but the rise of computational science has added a new layer of inaccessibility. Although it is now accepted that data should be made available on request, the current regulations regarding the availability of software are inconsistent. We argue that, with some exceptions, anything less than the release of source programs is intolerable for results that depend on computation. The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, but withholding code increases the chances that efforts to reproduce results will fail.”

URL : http://www.nature.com/nature/journal/v482/n7386/full/nature10836.html

Willingness to Share Research Data Is Related to…

Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results :

Background : The widespread reluctance to share published research data is often hypothesized to be due to the authors’ fear that reanalysis may expose errors in their work or may produce conclusions that contradict their own. However, these hypotheses have not previously been studied systematically.

Methods and Findings : We related the reluctance to share research data for reanalysis to 1148 statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance.

Conclusions : Our findings on the basis of psychological papers suggest that statistical results are particularly hard to verify when reanalysis is more likely to lead to contrasting conclusions. This highlights the importance of establishing mandatory data archiving policies.”

URL : http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0026828
doi:10.1371/journal.pone.0026828