SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Estimation theory

172

We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a Bayesian method for estimating the sample allele frequency in a single site, and show how the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set.

Concepts: Statistics, Mathematics, Estimation theory, Maximum likelihood, Computer program, Allele frequency, Bayesian inference, Likelihood function

171

BACKGROUND: Lidar height data collected by the Geosciences Laser Altimeter System (GLAS) from 2002 to 2008 has the potential to form the basis of a globally consistent sample-based inventory of forest biomass. GLAS lidar return data were collected globally in spatially discrete full waveform “shots,” which have been shown to be strongly correlated with aboveground forest biomass. Relationships observed at spatially coincident field plots may be used to model biomass at all GLAS shots, and well-established methods of model-based inference may then be used to estimate biomass and variance for specific spatial domains. However, the spatial pattern of GLAS acquisition is neither random across the surface of the earth nor is it identifiable with any particular systematic design. Undefined sample properties therefore hinder the use of GLAS in global forest sampling. RESULTS: We propose a method of identifying a subset of the GLAS data which can justifiably be treated as a simple random sample in model-based biomass estimation. The relatively uniform spatial distribution and locally arbitrary positioning of the resulting sample is similar to the design used by the US national forest inventory (NFI). We demonstrated model-based estimation using a sample of GLAS data in the US state of California, where our estimate of biomass (211 Mg/hectare) was within the 1.4% standard error of the design-based estimate supplied by the US NFI. The standard error of the GLAS-based estimate was significantly higher than the NFI estimate, although the cost of the GLAS estimate (excluding costs for the satellite itself) was almost nothing, compared to at least US$ 10.5 million for the NFI estimate. CONCLUSIONS: Global application of model-based estimation using GLAS, while demanding significant consolidation of training data, would improve inter-comparability of international biomass estimates by imposing consistent methods and a globally coherent sample frame. The methods presented here constitute a globally extensible approach for generating a simple random sample from the global GLAS dataset, enabling its use in forest inventory activities.

Concepts: Statistics, Variance, Mathematics, Simple random sample, Sample size, Estimation theory, Estimator, Sampling

164

Advances in the development of micro-electromechanical systems (MEMS) have made possible the fabrication of cheap and small dimension accelerometers and gyroscopes, which are being used in many applications where the global positioning system (GPS) and the inertial navigation system (INS) integration is carried out, i.e., identifying track defects, terrestrial and pedestrian navigation, unmanned aerial vehicles (UAVs), stabilization of many platforms, etc. Although these MEMS sensors are low-cost, they present different errors, which degrade the accuracy of the navigation systems in a short period of time. Therefore, a suitable modeling of these errors is necessary in order to minimize them and, consequently, improve the system performance. In this work, the most used techniques currently to analyze the stochastic errors that affect these sensors are shown and compared: we examine in detail the autocorrelation, the Allan variance (AV) and the power spectral density (PSD) techniques. Subsequently, an analysis and modeling of the inertial sensors, which combines autoregressive (AR) filters and wavelet de-noising, is also achieved. Since a low-cost INS (MEMS grade) presents error sources with short-term (high-frequency) and long-term (low-frequency) components, we introduce a method that compensates for these error terms by doing a complete analysis of Allan variance, wavelet de-nosing and the selection of the level of decomposition for a suitable combination between these techniques. Eventually, in order to assess the stochastic models obtained with these techniques, the Extended Kalman Filter (EKF) of a loosely-coupled GPS/INS integration strategy is augmented with different states. Results show a comparison between the proposed method and the traditional sensor error models under GPS signal blockages using real data collected in urban roadways.

Concepts: Estimation theory, Signal processing, Inertial navigation system, Accelerometer, Global Positioning System, Dead reckoning, Autocorrelation, Unmanned aerial vehicle

163

The problem of determining the optimal geometric configuration of a sensor network that will maximize the range-related information available for multiple target positioning is of key importance in a multitude of application scenarios. In this paper, a set of sensors that measures the distances between the targets and each of the receivers is considered, assuming that the range measurements are corrupted by white Gaussian noise, in order to search for the formation that maximizes the accuracy of the target estimates. Using tools from estimation theory and convex optimization, the problem is converted into that of maximizing, by proper choice of the sensor positions, a convex combination of the logarithms of the determinants of the Fisher Information Matrices corresponding to each of the targets in order to determine the sensor configuration that yields the minimum possible covariance of any unbiased target estimator. Analytical and numerical solutions are well defined and it is shown that the optimal configuration of the sensors depends explicitly on the constraints imposed on the sensor configuration, the target positions, and the probabilistic distributions that define the prior uncertainty in each of the target positions. Simulation examples illustrate the key results derived.

Concepts: Mathematics, Estimation theory, Estimator, Maximum likelihood, Signal processing, Optimization, Sensor, Wireless sensor network

140

When estimating the average effect of a binary treatment (or exposure) on an outcome, methods that incorporate propensity scores, the G-formula, or targeted maximum likelihood estimation (TMLE) are preferred over naïve regression approaches, which are biased under misspecification of a parametric outcome model. In contrast propensity score methods require the correct specification of an exposure model. Double-robust methods only require correct specification of either the outcome or the exposure model. Targeted maximum likelihood estimation is a semiparametric double-robust method that improves the chances of correct model specification by allowing for flexible estimation using (nonparametric) machine-learning methods. It therefore requires weaker assumptions than its competitors. We provide a step-by-step guided implementation of TMLE and illustrate it in a realistic scenario based on cancer epidemiology where assumptions about correct model specification and positivity (ie, when a study participant had 0 probability of receiving the treatment) are nearly violated. This article provides a concise and reproducible educational introduction to TMLE for a binary outcome and exposure. The reader should gain sufficient understanding of TMLE from this introductory tutorial to be able to apply the method in practice. Extensive R-code is provided in easy-to-read boxes throughout the article for replicability. Stata users will find a testing implementation of TMLE and additional material in the Appendix S1 and at the following GitHub repository: https://github.com/migariane/SIM-TMLE-tutorial.

Concepts: Scientific method, Estimation theory, Maximum likelihood, Propensity score, Likelihood function, Propensity score matching, Fisher information, Method of moments

140

The Kalman filter has been widely applied in the field of dynamic navigation and positioning. However, its performance will be degraded in the presence of significant model errors and uncertain interferences. In the literature, the fading filter was proposed to control the influences of the model errors, and the H-infinity filter can be adopted to address the uncertainties by minimizing the estimation error in the worst case. In this paper, a new multiple fading factor, suitable for the Global Positioning System (GPS) and the Inertial Navigation System (INS) integrated navigation system, is proposed based on the optimization of the filter, and a comprehensive filtering algorithm is constructed by integrating the advantages of the H-infinity filter and the proposed multiple fading filter. Measurement data of the GPS/INS integrated navigation system are collected under actual conditions. Stability and robustness of the proposed filtering algorithm are tested with various experiments and contrastive analysis are performed with the measurement data. Results demonstrate that both the filter divergence and the influences of outliers are restrained effectively with the proposed filtering algorithm, and precision of the filtering results are improved simultaneously.

Concepts: Estimation theory, Kalman filter, Global Positioning System, Dead reckoning, Automotive navigation system, GPS, Navigational equipment, Korean Air Lines Flight 007

32

Therapeutic substitution offers potential to decrease pharmaceutical expenditures and potentially improve the efficiency of the health care system.

Concepts: Health care, Health economics, Medicine, Healthcare, Health, Estimation theory, Economics, Potential

28

In covariance structure analysis, two-stage least-squares (2SLS) estimation has been recommended for use over maximum likelihood estimation when model misspecification is suspected. However, 2SLS often fails to provide stable and accurate solutions, particularly for structural equation models with small samples. To address this issue, a regularized extension of 2SLS is proposed that integrates a ridge type of regularization into 2SLS, thereby enabling the method to effectively handle the small-sample-size problem. Results are then reported of a Monte Carlo study conducted to evaluate the performance of the proposed method, as compared to its nonregularized counterpart. Finally, an application is presented that demonstrates the empirical usefulness of the proposed method.

Concepts: Regression analysis, Sample size, Estimation theory, Econometrics, Maximum likelihood, Fisher information, Structural equation modeling, Generalized method of moments

28

Quantifying diversity is of central importance for the study of structure, function and evolution of microbial communities. The estimation of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reliably estimate the absolute and relative number of microbial species present in a community without making unsupported assumptions about species abundance distributions. The reason for this is that sample data do not contain information about the number of rare species in the tail of species abundance distributions. We illustrate the difficulty in comparing species richness estimates by applying Chao’s estimator of species richness to a set of in silico communities: they are ranked incorrectly in the presence of large numbers of rare species. Next, we extend our analysis to a general family of diversity metrics (‘Hill diversities’), and construct lower and upper estimates of diversity values consistent with the sample data. The theory generalizes Chao’s estimator, which we retrieve as the lower estimate of species richness. We show that Shannon and Simpson diversity can be robustly estimated for the in silico communities. We analyze nine metagenomic data sets from a wide range of environments, and show that our findings are relevant for empirically-sampled communities. Hence, we recommend the use of Shannon and Simpson diversity rather than species richness in efforts to quantify and compare microbial diversity.The ISME Journal advance online publication, 14 February 2013; doi:10.1038/ismej.2013.10.

Concepts: Statistics, Mathematics, Estimation theory, Estimator, Approximation, Estimation, Microorganism, Robust statistics

28

AIM: The aim of our study was to assess improvements in spatial resolution and noise control from the application of the Astonish resolution recovery algorithm for single photon emission computed tomography imaging. Secondary aims were to compare acquisitions made with low-energy general purpose collimators with those obtained using low-energy high-resolution collimators in this context and evaluate the potential of a finer matrix to improve image quality further. MATERIALS AND METHODS: A Tc-filled Jaszczak phantom with hot spheres was used to assess contrast and noise. A National Electrical Manufacturers Association triple line source single photon emission computed tomography resolution phantom was used to measure spatial resolution. Acquisitions were made using both low-energy high-resolution and low-energy general purpose collimators. RESULTS: Compared with standard ordered subsets expectation maximization reconstructions, the resolution recovery algorithm resulted in a higher spatial resolution (8 vs. 14 mm full-width at half-maximum) leading to reduced partial volume effects in the smaller Jaszczak spheres. Higher image contrast was achieved alongside lower levels of noise. An edge enhancement artefact was observed in the resolution recovery corrected images. An overestimate of the target-to-background activity was also observed for the larger spheres. CONCLUSION: The use of such an algorithm results in images characterized by increased spatial resolution and reduced noise. However, small sources of the order of 2-3 cm can be significantly overenhanced.

Concepts: Estimation theory, Optics, Medical imaging, Tomography, IMAGE, Single photon emission computed tomography, Reconstruction algorithm, National Electrical Manufacturers Association