Concept: Ranking


BACKGROUND: Mycobacterium tuberculosis encodes 11 putative serine-threonine proteins Kinases (STPK) which regulates transcription, cell development and interaction with the host cells. From the 11 STPKs three kinases namely PknA, PknB and PknG have been related to the mycobacterial growth. From previous studies it has been observed that PknB is essential for mycobacterial growth and expressed during log phase of the growth and phosphorylates substrates involved in peptidoglycan biosynthesis. In recent years many high affinity inhibitors are reported for PknB. Previously implementation of data fusion has shown effective enrichment of active compounds in both structure and ligand based approaches .In this study we have used three types of data fusion ranking algorithms on the PknB dataset namely, sum rank, sum score and reciprocal rank. We have identified reciprocal rank algorithm is capable enough to select compounds earlier in a virtual screening process. We have also screened the Asinex database with reciprocal rank algorithm to identify possible inhibitors for PknB. RESULTS: In our work we have used both structure-based and ligand-based approaches for virtual screening, and have combined their results using a variety of data fusion methods. We found that data fusion increases the chance of actives being ranked highly. Specifically, we found that the ranking of Pharmacophore search, ROCS and Glide XP fused with a reciprocal ranking algorithm not only outperforms structure and ligand based approaches but also capable of ranking actives better than the other two data fusion methods using the BEDROC, robust initial enhancement (RIE) and AUC metrics. These fused results were used to identify 45 candidate compounds for further experimental validation. CONCLUSION: We show that very different structure and ligand based methods for predicting drug-target interactions can be combined effectively using data fusion, outperforming any single method in ranking of actives. Such fused results show promise for a coherent selection of candidates for biological screening.

Concepts: Adenosine triphosphate, Enzyme, Tuberculosis, Non-parametric statistics, Mycobacterium, Mycobacterium tuberculosis, Ranking, RANK


Exposure to news, opinion and civic information increasingly occurs through social media. How do these online networks influence exposure to perspectives that cut across ideological lines? Using de-identified data, we examined how 10.1 million U.S. Facebook users interact with socially shared news. We directly measured ideological homophily in friend networks, and examine the extent to which heterogeneous friends could potentially expose individuals to cross-cutting content. We then quantified the extent to which individuals encounter comparatively more or less diverse content while interacting via Facebook’s algorithmically ranked News Feed, and further studied users' choices to click through to ideologically discordant content. Compared to algorithmic ranking, individuals' choices about what to consume had a stronger effect limiting exposure to cross-cutting content.

Concepts: Algorithm, Sociology, Science, Ranking, Marxism, Social, Facebook, Ideology


There is a new generation of emoticons, called emojis, that is increasingly being used in mobile communications and social media. In the past two years, over ten billion emojis were used on Twitter. Emojis are Unicode graphic symbols, used as a shorthand to express concepts and ideas. In contrast to the small number of well-known emoticons that carry clear emotional contents, there are hundreds of emojis. But what are their emotional contents? We provide the first emoji sentiment lexicon, called the Emoji Sentiment Ranking, and draw a sentiment map of the 751 most frequently used emojis. The sentiment of the emojis is computed from the sentiment of the tweets in which they occur. We engaged 83 human annotators to label over 1.6 million tweets in 13 European languages by the sentiment polarity (negative, neutral, or positive). About 4% of the annotated tweets contain emojis. The sentiment analysis of the emojis allows us to draw several interesting conclusions. It turns out that most of the emojis are positive, especially the most popular ones. The sentiment distribution of the tweets with and without emojis is significantly different. The inter-annotator agreement on the tweets with emojis is higher. Emojis tend to occur at the end of the tweets, and their sentiment polarity increases with the distance. We observe no significant differences in the emoji rankings between the 13 languages and the Emoji Sentiment Ranking. Consequently, we propose our Emoji Sentiment Ranking as a European language-independent resource for automated sentiment analysis. Finally, the paper provides a formalization of sentiment and a novel visualization in the form of a sentiment bar.

Concepts: Linguistics, Ranking, Text mining, Twitter, Internet slang, Sentiment analysis, Emoticon, Emoji


The journal impact factor (JIF), and how best to rate the performance of a journal and the articles they contain, are areas of great debate. The aim of this paper was to assess various ranking methods of journal quality for mental health nursing journals, and to list the top 10 articles that have received the most number of citations to date. Seven mental health nursing journals were chosen for the analysis of citations they received in 2010, as well as their current impact factors from two sources, and other data for ranking purposes. There was very little difference in the top four mental health nursing journals and their overall rankings when combining various bibliometric indicators. That said, the International Journal of Mental Health Nursing is currently the highest ranked mental health nursing journal based on JIF, but publishes fewer articles per year compared to other journals. Overall, very few articles received 50 or more citations. This study shows that researchers need to consider more than one ranking method when deciding where to send or publish their research.

Concepts: Academic publishing, Science, Ranking, Nature, Impact factor, Bibliometrics, PageRank


With thousands of pesticides registered by the United States Environmental Protection Agency, it not feasible to sample for all pesticides applied in agricultural communities. Hazard-ranking pesticides based on use, toxicity, and exposure potential can help prioritize community-specific pesticide hazards. This study applied hazard-ranking schemes for cancer, endocrine disruption, and reproductive/developmental toxicity in Yuma County, Arizona. An existing cancer hazard-ranking scheme was modified, and novel schemes for endocrine disruption and reproductive/developmental toxicity were developed to rank pesticide hazards. The hazard-ranking schemes accounted for pesticide use, toxicity, and exposure potential based on chemical properties of each pesticide. Pesticides were ranked as hazards with respect to each health effect, as well as overall chronic health effects. The highest hazard-ranked pesticides for overall chronic health effects were maneb, metam-sodium, trifluralin, pronamide, and bifenthrin. The relative pesticide rankings were unique for each health effect. The highest hazard-ranked pesticides differed from those most heavily applied, as well as from those previously detected in Yuma homes over a decade ago. The most hazardous pesticides for cancer in Yuma County, Arizona were also different from a previous hazard-ranking applied in California. Hazard-ranking schemes that take into account pesticide use, toxicity, and exposure potential can help prioritize pesticides of greatest health risk in agricultural communities. This study is the first to provide pesticide hazard-rankings for endocrine disruption and reproductive/developmental toxicity based on use, toxicity, and exposure potential. These hazard-ranking schemes can be applied to other agricultural communities for prioritizing community-specific pesticide hazards to target decreasing health risk.

Concepts: Pesticide, United States Environmental Protection Agency, Ranking, DDT, Arizona, Health effector, Yuma, Arizona, Yuma County, Arizona


Pharmacogenomics (PGx) studies are to identify genetic variants that may affect drug efficacy and toxicity. A machine understandable drug-gene relationship knowledge is important for many computational PGx studies and for personalised medicine. A comprehensive and accurate PGx-specific gene lexicon is important for automatic drug-gene relationship extraction from the scientific literature, rich knowledge source for PGx studies. In this study, we present a bootstrapping learning technique to rank 33,310 human genes with respect to their relevance to drug response. The algorithm uses only one seed PGx gene to iteratively extract and rank co-occurred genes using 20 million MEDLINE abstracts. Our ranking method is able to accurately rank PGx-specific genes highly among all human genes. Compared to randomly ranked genes (precision: 0.032, recall: 0.013, F1: 0.018), the algorithm has achieved significantly better performance (precision: 0.861, recall: 0.548, F1: 0.662) in ranking the top 2.5% of genes.

Concepts: Genetics, Evolution, Human genome, Genomics, Species, Science, Accuracy and precision, Ranking


Contaminated site remediation is generally difficult, time consuming, and expensive. As a result ranking may aid in efficient allocation of resources. In order to rank the priorities of contaminated sites, input parameters relevant to contaminant fate and transport, and exposure assessment should be as accurate as possible. Yet, in most cases these parameters are vague or not precise. Most of the current remediation priority ranking methodologies overlook the vagueness in parameter values or do not go beyond assigning a contaminated site to a risk class. The main objective of this study is to develop an alternative remedial priority ranking system (RPRS) for contaminated sites in which vagueness in parameter values is considered. RPRS aims to evaluate potential human health risks due to contamination using sufficiently comprehensive and readily available parameters in describing the fate and transport of contaminants in air, soil, and groundwater. Vagueness in parameter values is considered by means of fuzzy set theory. A fuzzy expert system is proposed for the evaluation of contaminated sites and a software (ConSiteRPRS) is developed in Microsoft Office Excel 2007 platform. Rankings are employed for hypothetical and real sites. Results show that RPRS is successful in distinguishing between the higher and lower risk cases.

Concepts: Microsoft Excel, Environmental remediation, Ranking, Parameter, Set, Microsoft, Fuzzy logic, Microsoft Office


The analysis of clinical trials aiming to show symptomatic benefits is often complicated by the ethical requirement for rescue medication when the disease state of patients worsens. In type 2 diabetes trials, patients receive glucose-lowering rescue medications continuously for the remaining trial duration, if one of several markers of glycemic control exceeds pre-specified thresholds. This may mask differences in glycemic values between treatment groups, because it will occur more frequently in less effective treatment groups. Traditionally, the last pre-rescue medication value was carried forward and analyzed as the end-of-trial value. The deficits of such simplistic single imputation approaches are increasingly recognized by regulatory authorities and trialists. We discuss alternative approaches and evaluate them through a simulation study. When the estimand of interest is the effect attributable to the treatments initially assigned at randomization, then our recommendation for estimation and hypothesis testing is to treat data after meeting rescue criteria as deterministically ‘missing’ at random, because initiation of rescue medication is determined by observed in-trial values. An appropriate imputation of values after meeting rescue criteria is then possible either directly through multiple imputation or implicitly with a repeated measures model. Crucially, one needs to jointly impute or model all markers of glycemic control that can lead to the initiation of rescue medication. An alternative for hypothesis testing only are rank tests with outcomes from patients ‘requiring rescue medication’ ranked worst, and non-rescued patients ranked according to final visit values. However, an appropriate ranking of not observed values may be controversial. Copyright © 2015 John Wiley & Sons, Ltd.

Concepts: Diabetes, Ranking, Morality, Expectation-maximization algorithm, Imputation


Organizing sport video data for performance analysis can be challenging, especially when this involves multiple attributes, and the criteria for sorting frequently changes depending on the user’s task. In this work, we propose a visual analytic system to convert a user’s knowledge on rankings to support such a process. The system enables users to specify a sort requirement in a flexible manner without depending on specific knowledge about individual sort keys. We use regression techniques to train different analytical models for different types of sorting requirements. We use visualization to facilitate the discovery of knowledge at different stages of the visual analytic process. This includes visualizing the parameters of the ranking model, visualizing the results of a sort query for interactive exploration, and the playback of sorted video clips. We demonstrate the system with a case study in rugby to find key instances for analyzing team and player performance.

Concepts: Scientific method, Case study, Logic, Ranking, Music video, Sorting, Organizing, Analytic


This paper studies the estimation of Dirichlet process mixtures over discrete incomplete rankings. The generative model for each mixture component is the generalized Mallows (GM) model, an exponential family model for permutations which extends seamlessly to top-t rankings. While the GM is remarkably tractable in comparison with other permutation models, its conjugate prior is not. Our main contribution is to derive the theory and algorithms for sampling from the desired posterior distributions under this DPM. We introduce a family of partially collapsed Gibbs samplers, containing as one extreme point an exact algorithm based on slice-sampling, and at the other a fast approximate sampler with superior mixing that is still very accurate in all but the lowest ranks. We empirically demonstrate the effectiveness of the approximation in reducing mixing time, the benefits of the Dirichlet process approach over alternative clustering techniques, and the applicability of the approach to exploring large real-world ranking datasets.

Concepts: Mixture, Non-parametric statistics, Estimation, Ranking, Bayes' theorem, Mix, Conjugate prior, Exponential family