Interroger le texte scientifique

Auteur/Author : Guillaume Cabanac

Les documents textuels sont des vecteurs d’information familiers et incontournables de notre société de l’information. Avec l’essor des plateformes numériques et des médias sociaux, le texte se décline désormais en pages web, billets de blogs, commentaires, tweets et tags, entre autres. Auparavant consommateurs passifs, les lecteurs se muent à leur tour en producteurs de contenus.

En résultent des échanges interpersonnels qui tissent des réseaux sociaux numériques s’étendant bien au-delà de nos cercles relationnels. Dans ce contexte, nature et format des textes, intentions de leurs auteurs (informer, rediffuser, critiquer, compléter, corriger, etc.), contexte spatio-temporel ainsi que véracité et fraîcheur variables des informations sont autant de subtilités à intégrer dans les modèles de recherche d’information.

La première partie de ce mémoire présente une synthèse de résultats en recherche d’information visant à modéliser ces facteurs pour améliorer la pertinence des recherches sur des corpus textuels, notamment issus de médias sociaux.

Le programme de recherche que je développe vise également à « interroger le texte » pour révéler des informations au sujet de son contenu, de ses auteurs et de ses lecteurs. Le texte scientifique a été choisi comme cible pour la richesse de son contenu et de ses méta- données. Ainsi, la deuxième partie du mémoire synthétise des résultats en scientométrie, terme désignant l’étude quantitative des sciences et de l’innovation.

Il s’est agi de questionner des textes scientifiques et les réseaux sous-jacents (lexique, références, auteurs, institutions, etc.) pour faire émerger des connaissances à forte valeur ajoutée et apporter un éclairage sur la création et la diffusion des savoirs scientifiques.

Les deux volets articulés dans ce mémoire concourent à définir un programme de recherche interdisciplinaire à la croisée de l’informatique, la scientométrie et la sociologie des sciences.

Son ambition consiste à interroger le texte scientifique pour en améliorer l’accès (via la recherche d’information) tout en contribuant à éliciter les ressorts de la genèse et de l’évolution des mondes sociaux et des savoirs en sciences (via la scientométrie).

URL : Interroger le texte scientifique

Alternative location : https://tel.archives-ouvertes.fr/tel-01413878/

A Review of Theory and Practice in Scientometrics

« Scientometrics is the study of the quantitative aspects of the process of science as a communication system. It is centrally, but not only, concerned with the analysis of citations in the academic literature. In recent years it has come to play a major role in the measurement and evaluation of research performance. In this review we consider: the historical development of scientometrics, sources of citation data, citation metrics and the « laws » of scientometrics, normalisation, journal impact factors and other journal metrics, visualising and mapping science, evaluation and policy, and future developments. »

URL : http://arxiv.org/abs/1501.05462

Open Access: pour une meilleure visibilité de la production scientifique médicale au Maroc

« Les bases de données internationales de l’Institute for Scientific Information (ISI) sont des outils incontournables mais incomplets pour évaluer la performance de la recherche et fournir des indicateurs statistiques sur le volume de la production scientifique d’un pays. Dans ce contexte, nous présenterons les résultats d’une étude bibliométrique de la production scientifique issue de la Faculté de Médecine et de Pharmacie-Casablanca. Nous mettrons l’accent sur les possibilités offertes par l’open access (la voie verte et la voie dorée) pour augmenter la visibilité de la production locale. »

URL : http://eprints.rclis.org/24187/

Scientometric Mapping of Remote Sensing Research Output A…

Scientometric Mapping of Remote Sensing Research Output: A Global Perspective :

« This paper presents a quantitative analysis of remote sensing, in terms of research out put throughout the world during 1975 – FEB 2010. During that period, 1188 papers have been published and the cited references have been 30654. The average number of publications published per year has been 38.07. The highest number of paper (119) was published in the year of 2009. The USA topped the list with 473 (39.8%) publications, followed by UK with 128 (10.8%) publications, India with 93 (7.8%) publications respectively. The highly productive authors are Kaufman YJ with 13 (1.1%) publications, followed by Wagner W with 10 (0.8%) publications. There were 1082 institutions involved in the research with NASA which topped the list with 112 (9.4%), followed by NOAA with 48 (4%) publications. The most preferred journal is IEEE Transaction on Geoscience and Remote Sensing with 103 papers, followed by International journal of Remote Sensing with 95 papers, Acta Astronautica with 64 papers. The most preferred language by scientist is English with 1170 (98.5%) publications. »

URL : http://digitalcommons.unl.edu/libphilprac/801/

Tracing scientists’ research trends realtimely

In this research, we propose a method to trace scientists’ research trends realtimely. By monitoring the downloads of scientific articles in the journal of Scientometrics for 744 hours, namely one month, we investigate the download statistics.

Then we aggregate the keywords in these downloaded research papers, and analyze the trends of article downloading and keyword downloading. Furthermore, taking both the download of keywords and publication of articles into consideration, we design a method to detect the emerging research trends.

We find that in scientometrics field, social media, new indices to quantify scientific productivity (g-index), webometrics, semantic, text mining, open access are emerging fields that information scientists are focusing on. »

URL : http://arxiv.org/abs/1208.1349

Le classement de Leiden environnement scientifique et configuration…

Le classement de Leiden: environnement scientifique et configuration :

« Le classement de Leiden s’impose aujourd’hui comme une alternative pertinente et valable vis-à-vis de celui de Shanghai. De nombreux indicateurs font intervenir les caractéristiques propres aux champs disciplinaires et des calculs fondés sur le principe de distribution. Il est conçu par le centre CWTS de l’université néerlandaise de Leiden. »

« The Leiden Ranking is considered today as quite a pertinent and valuable alternative vs. the Shanghai Ranking. A significant number of indicators involve for instance Fields Citation Scores and data distribution. It is conceived by the CWTS of the University of Leiden – The Netherlands. »

URL : http://archivesic.ccsd.cnrs.fr/sic_00696098

Can Tweets Predict Citations? Metrics of Social Impact Based on Twitter and Correlation with Traditional Metrics of Scientific Impact

Background

Citations in peer-reviewed articles and the impact factor are generally accepted measures of scientific impact. Web 2.0 tools such as Twitter, blogs or social bookmarking tools provide the possibility to construct innovative article-level or journal-level metrics to gauge impact and influence. However, the relationship of the these new metrics to traditional metrics such as citations is not known.

Objective:

(1) To explore the feasibility of measuring social impact of and public attention to scholarly articles by analyzing buzz in social media, (2) to explore the dynamics, content, and timing of tweets relative to the publication of a scholarly article, and (3) to explore whether these metrics are sensitive and specific enough to predict highly cited articles.

Methods

Between July 2008 and November 2011, all tweets containing links to articles in the Journal of Medical Internet Research (JMIR) were mined. For a subset of 1573 tweets about 55 articles published between issues 3/2009 and 2/2010, different metrics of social media impact were calculated and compared against subsequent citation data from Scopus and Google Scholar 17 to 29 months later. A heuristic to predict the top-cited articles in each issue through tweet metrics was validated.

Results

A total of 4208 tweets cited 286 distinct JMIR articles. The distribution of tweets over the first 30 days after article publication followed a power law (Zipf, Bradford, or Pareto distribution), with most tweets sent on the day when an article was published (1458/3318, 43.94% of all tweets in a 60-day period) or on the following day (528/3318, 15.9%), followed by a rapid decay. The Pearson correlations between tweetations and citations were moderate and statistically significant, with correlation coefficients ranging from .42 to .72 for the log-transformed Google Scholar citations, but were less clear for Scopus citations and rank correlations. A linear multivariate model with time and tweets as significant predictors (P < .001) could explain 27% of the variation of citations. Highly tweeted articles were 11 times more likely to be highly cited than less-tweeted articles (9/12 or 75% of highly tweeted article were highly cited, while only 3/43 or 7% of less-tweeted articles were highly cited; rate ratio 0.75/0.07 = 10.75, 95% confidence interval, 3.4–33.6). Top-cited articles can be predicted from top-tweeted articles with 93% specificity and 75% sensitivity.

Conclusions

Tweets can predict highly cited articles within the first 3 days of article publication. Social media activity either increases citations or reflects the underlying qualities of the article that also predict citations, but the true use of these metrics is to measure the distinct concept of social impact. Social impact measures based on tweets are proposed to complement traditional citation metrics. The proposed twimpact factor may be a useful and timely metric to measure uptake of research findings and to filter research findings resonating with the public in real time.

URL : http://www.jmir.org/2011/4/e123/