DataMed – an open source discovery index for finding biomedical datasets

Authors : Xiaoling Chen, Anupama E Gururaj, Burak Ozyurt, Ruiling Liu, Ergin Soysal, Trevor Cohen, Firat Tiryaki, Yueling Li, Nansu Zong, Min Jiang, Deevakar Rogith, Mandana Salimi, Hyeon-eui Kim, Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Claudiu Farcas, Todd Johnson, Ron Margolis, George Alter, Susanna-Assunta Sansone, Ian M Fore, Lucila Ohno-Machado, Jeffrey S Grethe, Hua Xu

Objective

Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain.

Materials and Methods

DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health–funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium.

It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries.

In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine.

Results and Conclusion

Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services.

Currently, we have made the DataMed system publically available as an open source package for the biomedical community.

DOI : https://doi.org/10.1093/jamia/ocx121

 

Open Access and the Theological Imagination

Authors : Talea Anderson, David Squires

The past twenty years have witnessed a mounting crisis in academic publishing. Companies such as Reed-Elsevier, Wiley-Blackwell, and Taylor and Francis have earned unprecedented profits by controlling more and more scholarly output while increasing subscription rates to academic journals.

Thus publishers have consolidated their influence despite widespread hopes that digital platforms would disperse control over knowledge production. Open access initiatives dating back to the mid-1990s evidence a religious zeal for overcoming corporate interests in academic publishing, with key advocates branding their efforts as archivangelism.

Little attention has been given to the legacy or implications of religious rhetoric in open access debates despite its increasing pitch in recent years. This essay shows how the Protestant imaginary reconciles–rather than opposes–open access initiatives with market economics by tracing the rhetoric of openness to free-market liberalism.

Working against the tendency to accept the Reformation as an analogy for the relationship between knowledge production, publishers, and academics, we read Protestantism as a counterproductive element of the archivangelist inheritance.

URL : http://www.digitalhumanities.org/dhq/vol/11/4/000340/000340.html

Autorité scientifique et épistémique à l’épreuve de la mesure des citations

Auteur/Author : Evelyne Broudoux

Cet article est basé sur une communication donnée le 18 mars 2016 à l’occasion du Colloque « Médiations informatisées de l’autorité : nouvelles écritures, nouvelles pratiques de la reconnaissance ? », organisé à l’ISCC par le Gripic/Celsa.

Cet article fait le point sur la construction de l’autorité scientifique dans la communication scientifique en examinant sous l’angle du repérage des autorités épistémique et scientifique : le champ social des disciplines scientifiques, les études réalisées autour de l’analyse des citations et ses mesures, l’usage croissant du web social et l’auto-référence, la sémantisation des citations.

L’objectif étant de regarder les enjeux actualisés de la communication scientifique.

URL : https://archivesic.ccsd.cnrs.fr/sic_01664792

Open Access Determinants and the Effect on Article Performance

Author : Sumiko Asai

Although open access has steadily developed with the continuous increase in subscription journal price, the effect of open access articles on citations remains a controversial issue. The present study empirically examines the factors determining authors’ choice to provide open access and the effects of open access on downloads and citations in hybrid journals.

This study estimates author’s choice of open access using a probit model, and the results show that the cost of open access is an important factor in the decision. After a test for endogeneity of open access choice, the equation for downloads is estimated with the variables representing characteristics of articles and authors.

The results of estimating downloads by ordinary least squares show that open access increases the number of downloads in hybrid journals. On the other hand, from citation estimations using a negative binominal model, this study found that the effect of open access on the number of citations differs among hybrid journals.

It is a good practice for authors to consider a balance between article processing charges and the benefits that will be gained from open access when deciding whether to provide open access.

URL : Open Access Determinants and the Effect on Article Performance

DOI : 10.11648/j.ijber.20170606.11

Open access levels: a quantitative exploration using Web of Science and oaDOI data

Authors : Jeroen Bosman, Bianca Kramer

Across the world there is growing interest in open access publishing among researchers, institutions, funders and publishers alike. It is assumed that open access levels are growing, but hitherto the exact levels and patterns of open access have been hard to determine and detailed quantitative studies are scarce.

Using newly available open access status data from oaDOI in Web of Science we are now able to explore year-on-year open access levels across research fields, languages, countries, institutions, funders and topics, and try to relate the resulting patterns to disciplinary, national and institutional contexts.

With data from the oaDOI API we also look at the detailed breakdown of open access by types of gold open access (pure gold, hybrid and bronze), using universities in the Netherlands as an example.

There is huge diversity in open access levels on all dimensions, with unexpected levels for e.g. Portuguese as language, Astronomy & Astrophysics as research field, countries like Tanzania, Peru and Latvia, and Zika as topic.

We explore methodological issues and offer suggestions to improve conditions for tracking open access status of research output. Finally, we suggest potential future applications for research and policy development. We have shared all data and code openly.

URL : Open access levels: a quantitative exploration using Web of Science and oaDOI data

DOI : https://doi.org/10.7287/peerj.preprints.3520v1

Edition et publication des contenus : regard transversal sur la transformation des modèles

Auteur/Author : Ghislaine Chartron

L’Internet a installé une transformation profonde des modèles d’édition et de publication des contenus inscrits, jusqu’alors, dans des filières bien identifiées : la presse, la littérature, la musique, l’édition de jeunesse, l’édition de recherche…

Le numérique a brouillé les frontières, déstabilisé les modèles en place, transformé en partie les pratiques des usagers. De multiples projets innovants ont émergé, plus ou moins plébiscités, plus ou moins pérennes notamment par leur modèle économique, une fois passé le temps des subventions.

Dans un premier temps nous rappelons les caractéristiques majeures du contexte au sein duquel évoluent ces offres : économie de l’Internet, évolutivité des technologies associées et des pratiques socio-numériques.

De façon transversale, nous identifierons de nouvelles propositions de valeurs liées au numérique. Nous nous intéresserons aux modalités de remontées des recettes, à leur hybridation, ainsi qu’à la stratégie déployée par les géants du web dans ce secteur et, conjointement, au renouvellement des régulations.

Enfin, au regard de la comparaison transversale engagée, la contribution s’attachera à proposer une typologie revisitée des modèles de publication.

URL : https://halshs.archives-ouvertes.fr/halshs-01522295