Knowledge Infrastructures in Science: Data, Diversity, and Digital Libraries

Statut

Digital libraries can be deployed at many points throughout the life cycles of scientific research projects from their inception through data collection, analysis, documentation, publication, curation, preservation, and stewardship. Requirements for digital libraries to manage research data vary along many dimensions, including life cycle, scale, research domain, and types and degrees of openness.

This article addresses the role of digital libraries in knowledge infrastructures for science, presenting evidence from long-term studies of four research sites. Findings are based on interviews (n=208), ethnographic fieldwork, document analysis, and historical archival research about scientific data practices, conducted over the course of more than a decade.

The Transformation of Knowledge, Culture, and Practice in Data-Driven Science: A Knowledge Infrastructures Perspective project is based on a 2×2 design, comparing two “big science” astronomy sites with two “little science” sites that span physical sciences, life sciences, and engineering, and on dimensions of project scale and temporal stage of life cycle.

The two astronomy sites invested in digital libraries for data management as part of their initial research design, whereas the smaller sites made smaller investments at later stages. Role specialization varies along the same lines, with the larger projects investing in information professionals, and smaller teams carrying out their own activities internally. Sites making the largest investments in digital libraries appear to view their datasets as their primary scientific legacy, while other sites stake their legacy elsewhere. Those investing in digital libraries are more concerned with the release and reuse of data; types and degrees of openness vary accordingly.

The need for expertise in digital libraries, data science, and data stewardship is apparent throughout all four sites. Examples are presented of the challenges in designing digital libraries and knowledge infrastructures to manage and steward research data.

URL : http://works.bepress.com/borgman/371/

Review times in peer review: quantitative analysis of editorial workflows

Statut

We examine selected aspects of peer review and suggest possible improvements. To this end, we analyse a dataset containing information about 300 papers submitted to the Biochemistry and Biotechnology section of the Journal of the Serbian Chemical Society. After separating the peer review process into stages that each review has to go through, we use a weighted directed graph to describe it in a probabilistic manner and test the impact of some modifications of the editorial policy on the efficiency of the whole process.

URL : http://arxiv.org/abs/1508.01134

Content Volatility of Scientific Topics in Wikipedia: A Cautionary Tale

Statut

Wikipedia has quickly become one of the most frequently accessed encyclopedic references, despite the ease with which content can be changed and the potential for ‘edit wars’ surrounding controversial topics. Little is known about how this potential for controversy affects the accuracy and stability of information on scientific topics, especially those with associated political controversy. Here we present an analysis of the Wikipedia edit histories for seven scientific articles and show that topics we consider politically but not scientifically “controversial” (such as evolution and global warming) experience more frequent edits with more words changed per day than pages we consider “noncontroversial” (such as the standard model in physics or heliocentrism).

For example, over the period we analyzed, the global warming page was edited on average (geometric mean ±SD) 1.9±2.7 times resulting in 110.9±10.3 words changed per day, while the standard model in physics was only edited 0.2±1.4 times resulting in 9.4±5.0 words changed per day. The high rate of change observed in these pages makes it difficult for experts to monitor accuracy and contribute time-consuming corrections, to the possible detriment of scientific accuracy. As our society turns to Wikipedia as a primary source of scientific information, it is vital we read it critically and with the understanding that the content is dynamic and vulnerable to vandalism and other shenanigans.

URL : Content Volatility of Scientific Topics in Wikipedia: A Cautionary Tale

DOI : 10.1371/journal.pone.0134454

Open Access Indicators and Scholarly Communications in Latin America

Statut

This book is the result of a joint research and development project supported by UNESCO and undertaken in 2013 by UNESCO in partnership with the Public Knowledge Project (PKP), the Scientific Electronic Library Online (SciELO), the Network of Scientific Journals of Latin America, the Caribbean, Spain and Portugal (RedALyC), Africa Journals Online (AJOL), the Latin America Social Sciences SchoolBrazil (FLACSO-Brazil), and the Latin American Council of Social Sciences (CLACSO). This book aims to contribute to the understanding of scholarly production, use and reach through measures that are open and inclusive. The present book is divided into two sections.

The first section presents a narrative summary of Open Access in Latin America, including a description of the major regional initiatives that are collecting and systematizing data related to Open Access scholarship, and of available data that can be used to understand the (i) growth, (ii) reach, and (iii) impact of Open Access in developing regions. The first section ends with recommendations for future activities. The second section includes in-depth case-studies with the descriptions of indicators and methodologies of peer-review journal portals SciELO and Redalyc, and a case of subject digital repository maintained by CLACSO.

URL : https://microblogging.infodocs.eu/wp-content/uploads/2015/08/alperin2014.pdf

Alternative location : http://hdl.handle.net/10760/25122

From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics

Motivation

Reproducing the results from a scientific paper can be challenging due to the absence of data and the computational tools required for their analysis. In addition, details relating to the procedures used to obtain the published results can be difficult to discern due to the use of natural language when reporting how experiments have been performed.

The Investigation/Study/Assay (ISA), Nanopublications (NP), and Research Objects (RO) models are conceptual data modelling frameworks that can structure such information from scientific papers. Computational workflow platforms can also be used to reproduce analyses of data in a principled manner. We assessed the extent by which ISA, NP, and RO models, together with the Galaxy workflow system, can capture the experimental processes and reproduce the findings of a previously published paper reporting on the development of SOAPdenovo2, a de novo genome assembler.

Results

Executable workflows were developed using Galaxy, which reproduced results that were consistent with the published findings. A structured representation of the information in the SOAPdenovo2 paper was produced by combining the use of ISA, NP, and RO models. By structuring the information in the published paper using these data and scientific workflow modelling frameworks, it was possible to explicitly declare elements of experimental design, variables, and findings.

The models served as guides in the curation of scientific information and this led to the identification of inconsistencies in the original published paper, thereby allowing its authors to publish corrections in the form of an errata.”

URL : From Peer-Reviewed to Peer-Reproduced in Scholarly Publishing: The Complementary Roles of Data Models and Workflows in Bioinformatics

DOI : 10.1371/journal.pone.0127612

Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science

Authors : Misha Teplitskiy, Grace Lu, Eamon Duede

With the rise of Wikipedia as a first-stop source for scientific knowledge, it is important to compare its representation of that knowledge to that of the academic literature. This article approaches such a comparison through academic references made within the worlds 50 largest Wikipedias.

Previous studies have raised concerns that Wikipedia editors may simply use the most easily accessible academic sources rather than sources of the highest academic status. We test this claim by identifying the 250 most heavily used journals in each of 26 research fields (4,721 journals, 19.4M articles in total) indexed by the Scopus database, and modeling whether topic, academic status, and accessibility make articles from these journals more or less likely to be referenced on Wikipedia.

We find that, controlling for field and impact factor, the odds that an open access journal is referenced on the English Wikipedia are 47% higher compared to closed access journals. Moreover, in most of the worlds Wikipedias a journals high status (impact factor) and accessibility (open access policy) both greatly increase the probability of referencing.

Among the implications of this study is that the chief effect of open access policies may be to significantly amplify the diffusion of science, through an intermediary like Wikipedia, to a broad public audience.

URL : Amplifying the Impact of Open Access: Wikipedia and the Diffusion of Science

Alternative location : http://onlinelibrary.wiley.com/doi/10.1002/asi.23687/full

Data journals: A survey

Data occupy a key role in our information society. However, although the amount of published data continues to grow and terms such as data deluge and big data today characterize numerous (research) initiatives, much work is still needed in the direction of publishing data in order to make them effectively discoverable, available, and reusable by others.

Several barriers hinder data publishing, from lack of attribution and rewards, vague citation practices, and quality issues to a rather general lack of a data-sharing culture.

Lately, data journals have overcome some of these barriers. In this study of more than 100 currently existing data journals, we describe the approaches they promote for data set description, availability, citation, quality, and open access. We close by identifying ways to expand and strengthen the data journals approach as a means to promote data set access and exploitation.”

URL : http://www.niso.org/apps/group_public/download.php/14938/DataJournalsSurvey%20%281%29.pdf