Knowledge Infrastructures in Science: Data, Diversity, and Digital Libraries

Statut

Digital libraries can be deployed at many points throughout the life cycles of scientific research projects from their inception through data collection, analysis, documentation, publication, curation, preservation, and stewardship. Requirements for digital libraries to manage research data vary along many dimensions, including life cycle, scale, research domain, and types and degrees of openness.

This article addresses the role of digital libraries in knowledge infrastructures for science, presenting evidence from long-term studies of four research sites. Findings are based on interviews (n=208), ethnographic fieldwork, document analysis, and historical archival research about scientific data practices, conducted over the course of more than a decade.

The Transformation of Knowledge, Culture, and Practice in Data-Driven Science: A Knowledge Infrastructures Perspective project is based on a 2×2 design, comparing two “big science” astronomy sites with two “little science” sites that span physical sciences, life sciences, and engineering, and on dimensions of project scale and temporal stage of life cycle.

The two astronomy sites invested in digital libraries for data management as part of their initial research design, whereas the smaller sites made smaller investments at later stages. Role specialization varies along the same lines, with the larger projects investing in information professionals, and smaller teams carrying out their own activities internally. Sites making the largest investments in digital libraries appear to view their datasets as their primary scientific legacy, while other sites stake their legacy elsewhere. Those investing in digital libraries are more concerned with the release and reuse of data; types and degrees of openness vary accordingly.

The need for expertise in digital libraries, data science, and data stewardship is apparent throughout all four sites. Examples are presented of the challenges in designing digital libraries and knowledge infrastructures to manage and steward research data.

URL : http://works.bepress.com/borgman/371/

Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study

Statut

Objective

This study informs efforts to improve the discoverability of and access to biomedical datasets by providing a preliminary estimate of the number and type of datasets generated annually by research funded by the U.S. National Institutes of Health (NIH). It focuses on those datasets that are “invisible” or not deposited in a known repository.

Methods

We analyzed NIH-funded journal articles that were published in 2011, cited in PubMed and deposited in PubMed Central (PMC) to identify those that indicate data were submitted to a known repository. After excluding those articles, we analyzed a random sample of the remaining articles to estimate how many and what types of invisible datasets were used in each article.

Results

About 12% of the articles explicitly mention deposition of datasets in recognized repositories, leaving 88% that are invisible datasets. Among articles with invisible datasets, we found an average of 2.9 to 3.4 datasets, suggesting there were approximately 200,000 to 235,000 invisible datasets generated from NIH-funded research published in 2011. Approximately 87% of the invisible datasets consist of data newly collected for the research reported; 13% reflect reuse of existing data. More than 50% of the datasets were derived from live human or non-human animal subjects.

Conclusion

In addition to providing a rough estimate of the total number of datasets produced per year by NIH-funded researchers, this study identifies additional issues that must be addressed to improve the discoverability of and access to biomedical research data: the definition of a “dataset,” determination of which (if any) data are valuable for archiving and preservation, and better methods for estimating the number of datasets of interest. Lack of consensus amongst annotators about the number of datasets in a given article reinforces the need for a principled way of thinking about how to identify and characterize biomedical datasets.

URL : Sizing the Problem of Improving Discovery and Access to NIH-Funded Data: A Preliminary Study

DOI : 10.1371/journal.pone.0132735

Research Data Practices in Veterinary Medicine: A Case Study

Statut

Objective
To determine trends in research data output, reuse, and sharing of the college of veterinary medicine faculty members at a large academic research institution.
Methods
This bibliographic study was conducted by examining original research articles for indication of the types of data produced, as well as evidence that the authors reused data or made provision for sharing their own data. Findings were recorded in the categories of research type, data type, data reuse, data sharing, author collaboration, and grants/funding and were analyzed to determine trends.
Results
A variety of different data types were encountered in this study, even within a single article, resulting primarily from clinical and laboratory animal studies. All of the articles resulted from author collaboration, both within the University of Illinois at Urbana – Champaign, as well as with researchers outside the institution. There was little indication that data was reused, except some instances where the authors acknowledged that data was obtained directly from a colleague. There was even less indication that the research data was shared, either as a supplementary file on the publisher’s website or by submission to a repository, except in the case of genetic data.
Conclusions
Veterinary researchers are prolific producers and users of a wide variety of data. Despite the large amount of collaborative research occurring in veterinary medicine, this study provided little evidence that veterinary researchers are reusing or sharing their data, except in an informal manner. Wider adoption of data management plans may serve to improve researchers’ data management practices.

Managing Research Data in Academic Institutions: Role of Libraries

“One of the global emerging trends in academic libraries is to facilitate the management of research data for the benefit of researchers and institutions. The purpose of this paper is to explore the role of a library in offering such research data management services. The paper discusses the importance of research data, its preservation, organization, dissemination and critical role in the scholarly research life cycle. The authors attempt to provide a vivid description of Research Data Management (RDM) as a service and in the process review the existing literature on the topic in addition to the indicating the tools and technologies that could be adopted in successful RDM service implementation. The paper also is an attempt to share the experience of creating the Vikram Sarabhai Library’s research data repository that was developed by adopting the open source software – CKAN.”

URL : http://eprints.rclis.org/24911/

Research data sharing: Developing a stakeholder-driven model for journal policies

Statut

“Conclusions of research articles depend on bodies of data that cannot be included in articles themselves. To share this data is important for reasons of both transparency and reuse. Science, Technology, and Medicine journals have a role in facilitating sharing, but by what mechanism is not yet clear. The Journal Research Data (JoRD) Project was a JISC (Joint Information Systems Committee)-funded feasibility study on the potential for a central service on journal research data policies. The objectives of the study included identifying the current state of journal data sharing policies and investigating stakeholders’ views and practices. The project confirmed that a large percentage of journals have no data sharing policy and that there are inconsistencies between those that are traceable. This state leaves authors unsure of whether they should share article related data and where and how to deposit those data. In the absence of a consolidated infrastructure to share data easily, a model journal data sharing policy was developed by comparing quantitative information from analyzing existing journal data policies with qualitative data collected from stakeholders. This article summarizes and outlines the process by which the model was developed and presents the model journal data sharing policy.”

URL : http://eprints.nottingham.ac.uk/3185/

Research Data Explored II: the Anatomy and Reception of figshare

This is the second paper in a series of bibliometric studies of research data. In this paper, we present an analysis of figshare, one of the largest multidisciplinary repositories for research materials to date.

We analysed the structure of items archived in figshare, their usage, and their reception in two altmetrics sources (PlumX and ImpactStory). We found that figshare acts as a platform for newly published research materials, and as an archive for PLOS.

Depending on the function, we found different bibliometric characteristics. Items archived from PLOS tend to be coming from the natural sciences and are often unviewed and non-downloaded. Self-archived items, however, come from a variety of disciplines and exhibit some patterns of higher usage.

In the altmetrics analysis, we found that Twitter was the social media service where research data gained most attention; generally, research data published in 2014 were most popular across social media services.

PlumX detects considerably more items in social media and also finds higher altmetric scores than ImpactStory.

URL : http://arxiv.org/abs/1503.01298

A systematic review of barriers to data sharing in public health

Statut

Background : In the current information age, the use of data has become essential for decision making in public health at the local, national, and global level. Despite a global commitment to the use and sharing of public health data, this can be challenging in reality. No systematic framework or global operational guidelines have been created for data sharing in public health. Barriers at different levels have limited data sharing but have only been anecdotally discussed or in the context of specific case studies. Incomplete systematic evidence on the scope and variety of these barriers has limited opportunities to maximize the value and use of public health data for science and policy.

Methods : We conducted a systematic literature review of potential barriers to public health data sharing. Documents that described barriers to sharing of routinely collected public health data were eligible for inclusion and reviewed independently by a team of experts. We grouped identified barriers in a taxonomy for a focused international dialogue on solutions.

Results : Twenty potential barriers were identified and classified in six categories: technical, motivational, economic, political, legal and ethical. The first three categories are deeply rooted in well-known challenges of health information systems for which structural solutions have yet to be found; the last three have solutions that lie in an international dialogue aimed at generating consensus on policies and instruments for data sharing.

Conclusions : The simultaneous effect of multiple interacting barriers ranging from technical to intangible issues has greatly complicated advances in public health data sharing. A systematic framework of barriers to data sharing in public health will be essential to accelerate the use of valuable information for the global good.”

URL : A systematic review of barriers to data sharing in public health

Alternative URL : http://www.biomedcentral.com/1471-2458/14/1144