DataMed – an open source discovery index for finding biomedical datasets

Authors : Xiaoling Chen, Anupama E Gururaj, Burak Ozyurt, Ruiling Liu, Ergin Soysal, Trevor Cohen, Firat Tiryaki, Yueling Li, Nansu Zong, Min Jiang, Deevakar Rogith, Mandana Salimi, Hyeon-eui Kim, Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Claudiu Farcas, Todd Johnson, Ron Margolis, George Alter, Susanna-Assunta Sansone, Ian M Fore, Lucila Ohno-Machado, Jeffrey S Grethe, Hua Xu


Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain.

Materials and Methods

DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health–funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium.

It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries.

In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine.

Results and Conclusion

Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services.

Currently, we have made the DataMed system publically available as an open source package for the biomedical community.



Open Access and the Theological Imagination

Authors : Talea Anderson, David Squires

The past twenty years have witnessed a mounting crisis in academic publishing. Companies such as Reed-Elsevier, Wiley-Blackwell, and Taylor and Francis have earned unprecedented profits by controlling more and more scholarly output while increasing subscription rates to academic journals.

Thus publishers have consolidated their influence despite widespread hopes that digital platforms would disperse control over knowledge production. Open access initiatives dating back to the mid-1990s evidence a religious zeal for overcoming corporate interests in academic publishing, with key advocates branding their efforts as archivangelism.

Little attention has been given to the legacy or implications of religious rhetoric in open access debates despite its increasing pitch in recent years. This essay shows how the Protestant imaginary reconciles–rather than opposes–open access initiatives with market economics by tracing the rhetoric of openness to free-market liberalism.

Working against the tendency to accept the Reformation as an analogy for the relationship between knowledge production, publishers, and academics, we read Protestantism as a counterproductive element of the archivangelist inheritance.


Open Access Determinants and the Effect on Article Performance

Author : Sumiko Asai

Although open access has steadily developed with the continuous increase in subscription journal price, the effect of open access articles on citations remains a controversial issue. The present study empirically examines the factors determining authors’ choice to provide open access and the effects of open access on downloads and citations in hybrid journals.

This study estimates author’s choice of open access using a probit model, and the results show that the cost of open access is an important factor in the decision. After a test for endogeneity of open access choice, the equation for downloads is estimated with the variables representing characteristics of articles and authors.

The results of estimating downloads by ordinary least squares show that open access increases the number of downloads in hybrid journals. On the other hand, from citation estimations using a negative binominal model, this study found that the effect of open access on the number of citations differs among hybrid journals.

It is a good practice for authors to consider a balance between article processing charges and the benefits that will be gained from open access when deciding whether to provide open access.

URL : Open Access Determinants and the Effect on Article Performance

DOI : 10.11648/j.ijber.20170606.11

Open access levels: a quantitative exploration using Web of Science and oaDOI data

Authors : Jeroen Bosman, Bianca Kramer

Across the world there is growing interest in open access publishing among researchers, institutions, funders and publishers alike. It is assumed that open access levels are growing, but hitherto the exact levels and patterns of open access have been hard to determine and detailed quantitative studies are scarce.

Using newly available open access status data from oaDOI in Web of Science we are now able to explore year-on-year open access levels across research fields, languages, countries, institutions, funders and topics, and try to relate the resulting patterns to disciplinary, national and institutional contexts.

With data from the oaDOI API we also look at the detailed breakdown of open access by types of gold open access (pure gold, hybrid and bronze), using universities in the Netherlands as an example.

There is huge diversity in open access levels on all dimensions, with unexpected levels for e.g. Portuguese as language, Astronomy & Astrophysics as research field, countries like Tanzania, Peru and Latvia, and Zika as topic.

We explore methodological issues and offer suggestions to improve conditions for tracking open access status of research output. Finally, we suggest potential future applications for research and policy development. We have shared all data and code openly.

URL : Open access levels: a quantitative exploration using Web of Science and oaDOI data


Social Media Attention Increases Article Visits: An Investigation on Article-Level Referral Data of PeerJ

Authors : Xianwen Wang, Yunxue Cui, Qingchun Li, Xinhui Guo

In order to better understand the effect of social media in the dissemination of scholarly articles, employing the daily updated referral data of 110 PeerJ articles collected over a period of 345 days, we analyze the relationship between social media attention and article visitors directed by social media.

Our results show that social media presence of PeerJ articles is high. About 68.18% of the papers receive at least one tweet from Twitter accounts other than @PeerJ, the official account of the journal.

Social media attention increases the dissemination of scholarly articles. Altmetrics could not only act as the complement of traditional citation measures but also play an important role in increasing the article downloads and promoting the impacts of scholarly articles. There also exists a significant correlation among the online attention from different social media platforms.

Articles with more Facebook shares tend to get more tweets. The temporal trends show that social attention comes immediately following publication but does not last long, so do the social media directed article views.


Attitudes and norms affecting scientists’ data reuse

Authors : Renata Gonçalves Curty, Kevin Crowston, Alison Specht, Bruce W. Grant, Elizabeth D. Dalton

The value of sharing scientific research data is widely appreciated, but factors that hinder or prompt the reuse of data remain poorly understood. Using the Theory of Reasoned Action, we test the relationship between the beliefs and attitudes of scientists towards data reuse, and their self-reported data reuse behaviour.

To do so, we used existing responses to selected questions from a worldwide survey of scientists developed and administered by the DataONE Usability and Assessment Working Group (thus practicing data reuse ourselves).

Results show that the perceived efficacy and efficiency of data reuse are strong predictors of reuse behaviour, and that the perceived importance of data reuse corresponds to greater reuse. Expressed lack of trust in existing data and perceived norms against data reuse were not found to be major impediments for reuse contrary to our expectations.

We found that reported use of models and remotely-sensed data was associated with greater reuse. The results suggest that data reuse would be encouraged and normalized by demonstration of its value.

We offer some theoretical and practical suggestions that could help to legitimize investment and policies in favor of data sharing.

URL : Attitudes and norms affecting scientists’ data reuse