Across academia and industry, text mining has become a popular strategy for keeping up with the rapid growth of the scientific literature. Text mining of the scientific literature has mostly been carried out on collections of abstracts, due to their availability. Here we present an analysis of 15 million English scientific full-text articles published during the period 1823-2016. We describe the development in article length and publication sub-topics during these nearly 250 years. We showcase the potential of text mining by extracting published protein-protein, disease-gene, and protein subcellular associations using a named entity recognition system, and quantitatively report on their accuracy using gold standard benchmark data sets. We subsequently compare the findings to corresponding results obtained on 16.5 million abstracts included in MEDLINE and show that text mining of full-text articles consistently outperforms using abstracts only.
Scientists are required to communicate science and research not only to other experts in the field, but also to scientists and experts from other fields, as well as to the public and policymakers. One fundamental suggestion when communicating with non-experts is to avoid professional jargon. However, because they are trained to speak with highly specialized language, avoiding jargon is difficult for scientists, and there is no standard to guide scientists in adjusting their messages. In this research project, we present the development and validation of the data produced by an up-to-date, scientist-friendly program for identifying jargon in popular written texts, based on a corpus of over 90 million words published in the BBC site during the years 2012-2015. The validation of results by the jargon identifier, the De-jargonizer, involved three mini studies: (1) comparison and correlation with existing frequency word lists in the literature; (2) a comparison with previous research on spoken language jargon use in TED transcripts of non-science lectures, TED transcripts of science lectures and transcripts of academic science lectures; and (3) a test of 5,000 pairs of published research abstracts and lay reader summaries describing the same article from the journals PLOS Computational Biology and PLOS Genetics. Validation procedures showed that the data classification of the De-jargonizer significantly correlates with existing frequency word lists, replicates similar jargon differences in previous studies on scientific versus general lectures, and identifies significant differences in jargon use between abstracts and lay summaries. As expected, more jargon was found in the academic abstracts than lay summaries; however, the percentage of jargon in the lay summaries exceeded the amount recommended for the public to understand the text. Thus, the De-jargonizer can help scientists identify problematic jargon when communicating science to non-experts, and be implemented by science communication instructors when evaluating the effectiveness and jargon use of participants in science communication workshops and programs.
In a 2005 paper that has been accessed more than a million times, John Ioannidis explained why most published research findings were false. Here he revisits the topic, this time to address how to improve matters. Please see later in the article for the Editors' Summary.
In February 2014, after not having spoken to him for more than a decade, I got an e-mail from my friend Paul Kalanithi asking for writing advice. A chief neurosurgical resident at Stanford who, at 36, had already conducted award-winning basic science research, Paul was trying to piece together a job as a physician-scientist and writer. But there was a hitch: he had just been diagnosed with terminal lung cancer. “I no longer know what the hell I’m doing,” he wrote. Though he’d had job offers, they’d arrived when he was too sick to take them. “It’s only in the . . .
- Proceedings of the National Academy of Sciences of the United States of America
- Published over 2 years ago
Uncertainty monitoring is a core property of metacognition, allowing individuals to adapt their decision-making strategies depending on the state of their knowledge. Although it has been argued that other animals share these metacognitive abilities, only humans seem to possess the ability to explicitly communicate their own uncertainty to others. It remains unknown whether this capacity is present early in development, or whether it emerges later with the ability to verbally report one’s own mental states. Here, using a nonverbal memory-monitoring paradigm, we show that 20-month-olds can monitor and report their own uncertainty. Infants had to remember the location of a hidden toy before pointing to indicate where they wanted to recover it. In an experimental group, infants were given the possibility to ask for help through nonverbal communication when they had forgotten the toy location. Compared with a control group in which infants had no other option but to decide by themselves, infants given the opportunity to ask for help used this option strategically to improve their performance. Asking for help was used selectively to avoid making errors and to decline difficult choices. These results demonstrate that infants are able to successfully monitor their own uncertainty and share this information with others to fulfill their goals.
In a new Essay in the Research Integrity Series, Daniele Fanelli examines the evidence and possible reasons for the rising number of retractions. Please see later in the article for the Editors' Summary.
We examine gender differences among the six PhD student cohorts 2004-2009 at the California Institute of Technology using a new dataset that includes information on trainees and their advisors and enables us to construct detailed measures of teams at the advisor level. We focus on the relationship between graduate student publications and: (1) their gender; (2) the gender of the advisor, (3) the gender pairing between the advisor and the student and (4) the gender composition of the team. We find that female graduate students co-author on average 8.5% fewer papers than men; that students writing with female advisors publish 7.7% more. Of particular note is that gender pairing matters: male students working with female advisors publish 10.0% more than male students working with male advisors; women students working with male advisors publish 8.5% less. There is no difference between the publishing patterns of male students working with male advisors and female students working with female advisors. The results persist and are magnified when we focus on the quality of the published articles, as measured by average Impact Factor, instead of number of articles. We find no evidence that the number of publications relates to the gender composition of the team. Although the gender effects are reasonably modest, past research on processes of positive feedback and cumulative advantage suggest that the difference will grow, not shrink, over the careers of these recent cohorts.
- British journal of psychology (London, England : 1953)
- Published almost 3 years ago
While much previous research has suggested that decreased transcription fluency has a detrimental effect on writing, there is recent evidence that decreased fluency can actually benefit cognitive processing. Across a series of experiments, we manipulated transcription fluency of ostensibly skilled typewriters by asking them to type essays in two conditions: both-handed and one-handed typewriting. We used the Coh-Metrix text analyser to investigate the effects of decreased transcription fluency on various aspects of essay writing, such as lexical sophistication, sentence complexity, and cohesion of essays (important indicators of successful writing). We demonstrate that decreased fluency can benefit certain aspects of writing and discuss potential mechanisms underlying disfluency effects in essay writing.
Pencil traces drawn on print papers are shown to function as strain gauges and chemiresistors. Regular graphite/clay pencils can leave traces composed of percolated networks of fine graphite powders, which exhibit reversible resistance changes upon compressive or tensile deflections. Flexible toy pencils can leave traces that are essentially thin films of graphite/polymer composites, which show reversible changes in resistance upon exposure to volatile organic compounds due to absorption/desorption induced swelling/recovery of the polymer binders. Pencil-on-paper devices are low-cost, extremely simple and rapid to fabricate. They are light, flexible, portable, disposable, and do not generate potentially negative environmental impact during processing and device fabrication. One can envision many other types of pencil drawn paper electronic devices that can take on a great variety of form factors. Hand drawn devices could be useful in resource-limited or emergency situations. They could also lead to new applications integrating art and electronics.
Throughout Antiquity magical amulets written on papyri, lead and silver were used for apotropaic reasons. While papyri often can be unrolled and deciphered, metal scrolls, usually very thin and tightly rolled up, cannot easily be unrolled without damaging the metal. This leaves us with unreadable results due to the damage done or with the decision not to unroll the scroll. The texts vary greatly and tell us about the cultural environment and local as well as individual practices at a variety of locations across the Mediterranean. Here we present the methodology and the results of the digital unfolding of a silver sheet from Jerash in Jordan from the mid-8(th) century CE. The scroll was inscribed with 17 lines in presumed pseudo-Arabic as well as some magical signs. The successful unfolding shows that it is possible to digitally unfold complexly folded scrolls, but that it requires a combination of the know-how of the software and linguistic knowledge.