SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Estimation

438

Women comprise a minority of the Science, Technology, Engineering, Mathematics, and Medicine (STEMM) workforce. Quantifying the gender gap may identify fields that will not reach parity without intervention, reveal underappreciated biases, and inform benchmarks for gender balance among conference speakers, editors, and hiring committees. Using the PubMed and arXiv databases, we estimated the gender of 36 million authors from >100 countries publishing in >6000 journals, covering most STEMM disciplines over the last 15 years, and made a web app allowing easy access to the data (https://lukeholman.github.io/genderGap/). Despite recent progress, the gender gap appears likely to persist for generations, particularly in surgery, computer science, physics, and maths. The gap is especially large in authorship positions associated with seniority, and prestigious journals have fewer women authors. Additionally, we estimate that men are invited by journals to submit papers at approximately double the rate of women. Wealthy countries, notably Japan, Germany, and Switzerland, had fewer women authors than poorer ones. We conclude that the STEMM gender gap will not close without further reforms in education, mentoring, and academic publishing.

Concepts: Statistics, Mathematics, Physics, Science, Approximation, Estimation, Computer science, Wealth

298

The number of quit attempts it takes a smoker to quit successfully is a commonly reported figure among smoking cessation programmes, but previous estimates have been based on lifetime recall in cross-sectional samples of successful quitters only. The purpose of this study is to improve the estimate of number of quit attempts prior to quitting successfully.

Concepts: Mathematics, Smoking, Nicotine, Smoking cessation, Estimation, Quitting

204

BACKGROUND: A number of studies have shown that bite and sip sizes influence the amount of food intake. Consuming with small sips instead of large sips means relatively more sips for the same amount of food to be consumed; people may believe that intake is higher which leads to faster satiation. This effect may be disturbed when people are distracted. OBJECTIVE: The objective of the study is to assess the effects of sip size in a focused state and a distracted state on ad libitum intake and on the estimated amount consumed. DESIGN: In this 3×2 cross-over design, 53 healthy subjects consumed ad libitum soup with small sips (5 g, 60 g/min), large sips (15 g, 60 g/min), and free sips (where sip size was determined by subjects themselves), in both a distracted and focused state. Sips were administered via a pump. There were no visual cues toward consumption. Subjects then estimated how much they had consumed by filling soup in soup bowls. RESULTS: Intake in the small-sip condition was ∼30% lower than in both the large-sip and free-sip conditions (P<0.001). In addition, subjects underestimated how much they had consumed in the large-sip and free-sip conditions (P<0.03). Distraction led to a general increase in food intake (P = 0.003), independent of sip size. Distraction did not influence sip size or estimations. CONCLUSIONS: Consumption with large sips led to higher food intake, as expected. Large sips, that were either fixed or chosen by subjects themselves led to underestimations of the amount consumed. This may be a risk factor for over-consumption. Reducing sip or bite sizes may successfully lower food intake, even in a distracted state.

Concepts: Nutrition, Food, Estimation, Restaurant, Distraction, Cost underestimation, Ad libitum

176

Billions of birds are estimated to be killed in window collisions every year, worldwide. A popular solution to this problem may lie in marking the glass with ultraviolet reflective or absorbing patterns, which the birds, but not humans, would see. Elegant as this remedy may seem at first glance, few of its proponents have taken into consideration how stark the contrasts between ultraviolet and human visible light reflections or transmissions must be to be visible to a bird under natural conditions. Complicating matters is that diurnal birds differ strongly in how their photoreceptors absorb ultraviolet and to a lesser degree blue light. We have used a physiological model of avian colour vision to estimate the chromatic contrasts of ultraviolet markings against a natural scene reflected and transmitted by ordinary window glass. Ultraviolets markings may be clearly visible under a range of lighting conditions, but only to birds with a UVS type of ultraviolet vision, such as many passerines. To bird species with the common VS type of vision, ultraviolet markings should only be visible if they produce almost perfect ultraviolet contrasts and are viewed against a scene with low chromatic variation but high ultraviolet content.

Concepts: Human, Mathematics, Light, Bird, Color, Estimation, Visible spectrum, Color vision

173

Scoring goals in a soccer match can be interpreted as a stochastic process. In the most simple description of a soccer match one assumes that scoring goals follows from independent rate processes of both teams. This would imply simple Poissonian and Markovian behavior. Deviations from this behavior would imply that the previous course of the match has an impact on the present match behavior. Here a general framework for the identification of deviations from this behavior is presented. For this endeavor it is essential to formulate an a priori estimate of the expected number of goals per team in a specific match. This can be done based on our previous work on the estimation of team strengths. Furthermore, the well-known general increase of the number of the goals in the course of a soccer match has to be removed by appropriate normalization. In general, three different types of deviations from a simple rate process can exist. First, the goal rate may depend on the exact time of the previous goals. Second, it may be influenced by the time passed since the previous goal and, third, it may reflect the present score. We show that the Poissonian scenario is fulfilled quite well for the German Bundesliga. However, a detailed analysis reveals significant deviations for the second and third aspect. Dramatic effects are observed if the away team leads by one or two goals in the final part of the match. This analysis allows one to identify generic features about soccer matches and to learn about the hidden complexities behind scoring goals. Among others the reason for the fact that the number of draws is larger than statistically expected can be identified.

Concepts: Time, Scientific method, Statistics, Mathematics, Estimator, Probability theory, Probability, Estimation

173

We provide a novel method, DRISEE (duplicate read inferred sequencing error estimation), to assess sequencing quality (alternatively referred to as “noise” or “error”) within and/or between sequencing samples. DRISEE provides positional error estimates that can be used to inform read trimming within a sample. It also provides global (whole sample) error estimates that can be used to identify samples with high or varying levels of sequencing error that may confound downstream analyses, particularly in the case of studies that utilize data from multiple sequencing samples. For shotgun metagenomic data, we believe that DRISEE provides estimates of sequencing error that are more accurate and less constrained by technical limitations than existing methods that rely on reference genomes or the use of scores (e.g. Phred). Here, DRISEE is applied to (non amplicon) data sets from both the 454 and Illumina platforms. The DRISEE error estimate is obtained by analyzing sets of artifactual duplicate reads (ADRs), a known by-product of both sequencing platforms. We present DRISEE as an open-source, platform-independent method to assess sequencing error in shotgun metagenomic data, and utilize it to discover previously uncharacterized error in de novo sequence data from the 454 and Illumina sequencing platforms.

Concepts: Sample, Scientific method, Estimator, Full genome sequencing, Error, Sequence, Java, Estimation

167

Assessing oil pollution using traditional field-based methods over large areas is difficult and expensive. Remote sensing technologies with good spatial and temporal coverage might provide an alternative for monitoring oil pollution by recording the spectral signals of plants growing in polluted soils. Total petroleum hydrocarbon concentrations of soils and the hyperspectral canopy reflectance were measured in wetlands dominated by reeds (Phragmites australis) around oil wells that have been producing oil for approximately 10 years in the Yellow River Delta, eastern China to evaluate the potential of vegetation indices and red edge parameters to estimate soil oil pollution. The detrimental effect of oil pollution on reed communities was confirmed by the evidence that the aboveground biomass decreased from 1076.5 g m(-2) to 5.3 g m(-2) with increasing total petroleum hydrocarbon concentrations ranging from 9.45 mg kg(-1) to 652 mg kg(-1). The modified chlorophyll absorption ratio index (MCARI) best estimated soil TPH concentration among 20 vegetation indices. The linear model involving MCARI had the highest coefficient of determination (R(2) = 0.73) and accuracy of prediction (RMSE = 104.2 mg kg(-1)). For other vegetation indices and red edge parameters, the R(2) and RMSE values ranged from 0.64 to 0.71 and from 120.2 mg kg(-1) to 106.8 mg kg(-1) respectively. The traditional broadband normalized difference vegetation index (NDVI), one of the broadband multispectral vegetation indices (BMVIs), produced a prediction (R(2) = 0.70 and RMSE = 110.1 mg kg(-1)) similar to that of MCARI. These results corroborated the potential of remote sensing for assessing soil oil pollution in large areas. Traditional BMVIs are still of great value in monitoring soil oil pollution when hyperspectral data are unavailable.

Concepts: Petroleum, People's Republic of China, Approximation, Estimation, Remote sensing, Hyperspectral imaging, Yellow River, Landfill

164

Symmetry is a biologically relevant, mathematically involving, and aesthetically compelling visual phenomenon. Mirror symmetry detection is considered particularly rapid and efficient, based on experiments with random noise. Symmetry detection in natural settings, however, is often accomplished against structured backgrounds. To measure salience of symmetry in diverse contexts, we assembled mirror symmetric patterns from 101 natural textures. Temporal thresholds for detecting the symmetry axis ranged from 28 to 568 ms indicating a wide range of salience (1/Threshold). We built a model for estimating symmetry-energy by connecting pairs of mirror-symmetric filters that simulated cortical receptive fields. The model easily identified the axis of symmetry for all patterns. However, symmetry-energy quantified at this axis correlated weakly with salience. To examine context effects on symmetry detection, we used the same model to estimate approximate symmetry resulting from the underlying texture throughout the image. Magnitudes of approximate symmetry at flanking and orthogonal axes showed strong negative correlations with salience, revealing context interference with symmetry detection. A regression model that included the context-based measures explained the salience results, and revealed why perceptual symmetry can differ from mathematical characterizations. Using natural patterns thus produces new insights into symmetry perception and its possible neural circuits.

Concepts: Statistics, Mathematics, Symmetry, Estimation, Symmetry group, Rotational symmetry, Point groups in three dimensions, Reflection symmetry

150

The global burden of cholera is largely unknown because the majority of cases are not reported. The low reporting can be attributed to limited capacity of epidemiological surveillance and laboratories, as well as social, political, and economic disincentives for reporting. We previously estimated 2.8 million cases and 91,000 deaths annually due to cholera in 51 endemic countries. A major limitation in our previous estimate was that the endemic and non-endemic countries were defined based on the countries' reported cholera cases. We overcame the limitation with the use of a spatial modelling technique in defining endemic countries, and accordingly updated the estimates of the global burden of cholera.

Concepts: Epidemiology, Statistics, Mathematics, Definition, Report, Estimation, Cholera, Aristotle

149

The Northwest India Aquifer (NWIA) has been shown to have the highest groundwater depletion (GWD) rate globally, threatening crop production and sustainability of groundwater resources. Gravity Recovery and Climate Experiment (GRACE) satellites have been emerging as a powerful tool to evaluate GWD with ancillary data. Accurate GWD estimation is, however, challenging because of uncertainties in GRACE data processing. We evaluated GWD rates over the NWIA using a variety of approaches, including newly developed constrained forward modeling resulting in a GWD rate of 3.1 ± 0.1 cm/a (or 14 ± 0.4 km(3)/a) for Jan 2005-Dec 2010, consistent with the GWD rate (2.8 cm/a or 12.3 km(3)/a) from groundwater-level monitoring data. Published studies (e.g., 4 ± 1 cm/a or 18 ± 4.4 km(3)/a) may overestimate GWD over this region. This study highlights uncertainties in GWD estimates and the importance of incorporating a priori information to refine spatial patterns of GRACE signals that could be more useful in groundwater resource management and need to be paid more attention in future studies.

Concepts: Time, Statistics, Estimation, A priori, A priori and a posteriori, Gravity Recovery and Climate Experiment, Ancillary data