SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Information retrieval

172

BACKGROUND: With the development of high throughput methods of gene analyses, there is a growing need for mining tools to retrieve relevant articles in PubMed. As PubMed grows, literature searches become more complex and time-consuming. Automated search tools with good precision and recall are necessary. We developed GO2PUB to automatically enrich PubMed queries with gene names, symbols and synonyms annotated by a GO term of interest or one of its descendants. RESULTS: GO2PUB enriches PubMed queries based on selected GO terms and keywords. It processes the result and displays the PMID, title, authors, abstract and bibliographic references of the articles. Gene names, symbols and synonyms that have been generated as extra keywords from the GO terms are also highlighted. GO2PUB is based on a semantic expansion of PubMed queries using the semantic inheritance between terms through the GO graph. Two experts manually assessed the relevance of GO2PUB, GoPubMed and PubMed on three queries about lipid metabolism. Experts' agreement was high (kappa=0.88). GO2PUB returned 69 % of the relevant articles, GoPubMed: 40 % and PubMed: 29 %. GO2PUB and GoPubMed have 17 % of their results in common, corresponding to 24 % of the total number of relevant results. 70 % of the articles returned by more than one tool were relevant. 36 % of the relevant articles were returned only by GO2PUB, 17 % only by GoPubMed and 14 % only by PubMed. For determining whether these results can be generalized, we generated twenty queries based on random GO terms with a granularity similar to those of the first three queries and compared the proportions of GO2PUB and GoPubMed results. These were respectively of 77 % and 40 % for the first queries, and of 70 % and 38 % for the random queries. The two experts also assessed the relevance of seven of the twenty queries (the three related to lipid metabolism and four related to other domains). Expert agreement was high (0.93 and 0.8). GO2PUB and GoPubMed performances were similar to those of the first queries. CONCLUSIONS: We demonstrated that the use of genes annotated by either GO terms of interest or a descendant of these GO terms yields some relevant articles ignored by other tools. The comparison of GO2PUB, based on semantic expansion, with GoPubMed, based on text mining techniques, showed that both tools are complementary. The analysis of the randomly-generated queries suggests that the results obtained about lipid metabolism can be generalized to other biological processes. GO2PUB is available at http://go2pub.genouest.org.

Concepts: Protein, Gene, Bioinformatics, Evolution, Organism, Mining, Information retrieval, Gene Ontology

74

The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as ‘outliers’. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the ‘uniqueness’ of the music of each country.

Concepts: Earth, Geography, Sound, Music, Digital signal processing, Information retrieval, Folk music, World music

62

When parents select similar sounding names for their children, do they set themselves up for more speech errors in the future? Questionnaire data from 334 respondents suggest that they do. Respondents whose names shared initial or final sounds with a sibling’s reported that their parents accidentally called them by the sibling’s name more often than those without such name overlap. Having a sibling of the same gender, similar appearance, or similar age was also associated with more frequent name substitutions. Almost all other name substitutions by parents involved other family members and over 5% of respondents reported a parent substituting the name of a pet, which suggests a strong role for social and situational cues in retrieving personal names for direct address. To the extent that retrieval cues are shared with other people or animals, other names become available and may substitute for the intended name, particularly when names sound similar.

Concepts: Family, Future, Parent, Sound, Substitute good, Sibling, Information retrieval

57

Similarity search-for example, identifying similar images in a database or similar documents on the web-is a fundamental computing problem faced by large-scale information retrieval systems. We discovered that the fruit fly olfactory circuit solves this problem with a variant of a computer science algorithm (called locality-sensitive hashing). The fly circuit assigns similar neural activity patterns to similar odors, so that behaviors learned from one odor can be applied when a similar odor is experienced. The fly algorithm, however, uses three computational strategies that depart from traditional approaches. These strategies can be translated to improve the performance of computational similarity searches. This perspective helps illuminate the logic supporting an important sensory function and provides a conceptually new algorithm for solving a fundamental computational problem.

Concepts: Mathematics, Computer, Computation, Logic, Computer science, Computing, Computational complexity theory, Information retrieval

32

Finding relevant information from large document collections such as the World Wide Web is a common task in our daily lives. Estimation of a user’s interest or search intention is necessary to recommend and retrieve relevant information from these collections. We introduce a brain-information interface used for recommending information by relevance inferred directly from brain signals. In experiments, participants were asked to read Wikipedia documents about a selection of topics while their EEG was recorded. Based on the prediction of word relevance, the individual’s search intent was modeled and successfully used for retrieving new relevant documents from the whole English Wikipedia corpus. The results show that the users' interests toward digital content can be modeled from the brain signals evoked by reading. The introduced brain-relevance paradigm enables the recommendation of information without any explicit user interaction and may be applied across diverse information-intensive applications.

Concepts: Brain, Human brain, World Wide Web, Object-oriented programming, Information retrieval

28

It is estimated that between 2% and 5% of the population experience symptoms of compulsive hoarding. Recent investigation into hoarding has shown that it is a problem in its own right and is therefore being added to a diagnostic manual of mental disorders. This integrative literature review examines the impact that hoarding has on family members. The comprehensive literature review spans a period from database inception to November 2012. A search of the databases Cumulative Index to Nursing and Allied Health Literature, Medical Literature Analysis and Retrieval System Online, and psycINFO, together with hand searches, was completed. Thematic analysis revealed three overriding themes: quality of life, shattered families and rallying around. These themes illuminate the negative impact that hoarding behaviour has on families and the inadequacy of available services. The relative lack of robust evidence about the impact of hoarding behaviour on families suggests that further research is needed in this emergent field.

Concepts: Medicine, Mental disorder, Compulsive hoarding, Information retrieval, Searching, Genre, Hoarding, Collecting

28

The current study examined memory specificity of autobiographical memories in individuals with and without a repressive coping style. It seems conceivable that reduced memory specificity may be a way to reduce accessibility of negative experiences, one of the hallmark features of a repressive coping style. It was therefore hypothesized that repressors would show reduced specificity when retrieving negative memories. In order to study memory specificity, participants (N = 103) performed the autobiographical memory test. Results showed that individuals with a repressive coping style were significantly less specific in retrieving negative experiences, relative to control groups of low anxious, high anxious, and defensive high anxious individuals. This result was restricted to negative memory retrieval, as participants did not differ in memory specificity for positive experiences. These results show that repressors retrieve negative autobiographical memories in an overgeneral way, possibly in order to avoid negative affect.

Concepts: Scientific method, Psychology, Affect, Memory, Episodic memory, Amnesia, Autobiographical memory, Information retrieval

27

OBJECTIVE: To determine whether the knowledge contained in a rich corpus of local terms mapped to LOINC (Logical Observation Identifiers Names and Codes) could be leveraged to help map local terms from other institutions. METHODS: We developed two models to test our hypothesis. The first based on supervised machine learning was created using Apache’s OpenNLP Maxent and the second based on information retrieval was created using Apache’s Lucene. The models were validated by a random subsampling method that was repeated 20 times and that used 80/20 splits for training and testing, respectively. We also evaluated the performance of these models on all laboratory terms from three test institutions. RESULTS: For the 20 iterations used for validation of our 80/20 splits Maxent and Lucene ranked the correct LOINC code first for between 70.5% and 71.4% and between 63.7% and 65.0% of local terms, respectively. For all laboratory terms from the three test institutions Maxent ranked the correct LOINC code first for between 73.5% and 84.6% (mean 78.9%) of local terms, whereas Lucene’s performance was between 66.5% and 76.6% (mean 71.9%). Using a cut-off score of 0.46 Maxent always ranked the correct LOINC code first for over 57% of local terms. CONCLUSIONS: This study showed that a rich corpus of local terms mapped to LOINC contains collective knowledge that can help map terms from other institutions. Using freely available software tools, we developed a data-driven automated approach that operates on term descriptions from existing mappings in the corpus. Accurate and efficient automated mapping methods can help to accelerate adoption of vocabulary standards and promote widespread health information exchange.

Concepts: Scientific method, Validation, Map, Standard, Standards, Information retrieval, Supervised learning, LOINC

25

When stimuli afford multiple tasks, switching among them involves promoting one of several task-sets in play into a most-active state. This process, often conceptualized as retrieving task parameters and stimulus-response (S-R) rules into procedural working memory, is a likely source of the reaction time (RT) cost of a task-switch, especially when no time is available for task preparation before the stimulus. We report 2 task-cuing experiments that asked whether the time consumed by task-set retrieval increases with the number of task-sets in play, while unconfounding the number of tasks with their frequency and recency of use. Participants were required to switch among 3 or 5 orthogonal classifications of perceptual attributes of an object (Experiment 1) or of phonological/semantic attributes of a word (Experiment 2), with a 100 or 1,300 ms cue-stimulus interval. For 2 tasks for which recency and frequency were matched in the 3- and 5-task conditions, there was no effect of number of tasks on the switch cost. For the other tasks, there was a greater switch cost in the 5-task condition with little time for preparation, attributable to effects of frequency/recency. Thus, retrieval time for active task-sets is not influenced by the number of alternatives per se (unlike several other kinds of memory retrieval) but is influenced by recency or frequency of use. (PsycINFO Database Record © 2014 APA, all rights reserved).

Concepts: Cognitive psychology, Economics, Experiment, Task, Object-oriented programming, All rights reserved, American Psychological Association, Information retrieval

22

Efficient storage and retrieval of digital data is the focus of much commercial and academic attention. With personal computers, there are two main ways to retrieve files: hierarchical navigation and query-based search. In navigation, users move down their virtual folder hierarchy until they reach the folder in which the target item is stored. When searching, users first generate a query specifying some property of the target file (e.g., a word it contains), and then select the relevant file when the search engine returns a set of results. Despite advances in search technology, users prefer retrieving files using virtual folder navigation, rather than the more flexible query-based search. Using fMRI we provide an explanation for this phenomenon by demonstrating that folder navigation results in activation of the posterior limbic (including the retrosplenial cortex) and parahippocampal regions similar to that previously observed during real-world navigation in both animals and humans. In contrast, search activates the left inferior frontal gyrus, commonly observed in linguistic processing. We suggest that the preference for navigation may be due to the triggering of automatic object finding routines and lower dependence on linguistic processing. We conclude with suggestions for future computer systems design.

Concepts: Hierarchy, Cerebrum, Computer, Search engine optimization, Inferior frontal gyrus, Personal computer, Information retrieval, File system