Concept: Estimation theory
We present a statistical framework for estimation and application of sample allele frequency spectra from New-Generation Sequencing (NGS) data. In this method, we first estimate the allele frequency spectrum using maximum likelihood. In contrast to previous methods, the likelihood function is calculated using a dynamic programming algorithm and numerically optimized using analytical derivatives. We then use a Bayesian method for estimating the sample allele frequency in a single site, and show how the method can be used for genotype calling and SNP calling. We also show how the method can be extended to various other cases including cases with deviations from Hardy-Weinberg equilibrium. We evaluate the statistical properties of the methods using simulations and by application to a real data set.
BACKGROUND: Lidar height data collected by the Geosciences Laser Altimeter System (GLAS) from 2002 to 2008 has the potential to form the basis of a globally consistent sample-based inventory of forest biomass. GLAS lidar return data were collected globally in spatially discrete full waveform “shots,” which have been shown to be strongly correlated with aboveground forest biomass. Relationships observed at spatially coincident field plots may be used to model biomass at all GLAS shots, and well-established methods of model-based inference may then be used to estimate biomass and variance for specific spatial domains. However, the spatial pattern of GLAS acquisition is neither random across the surface of the earth nor is it identifiable with any particular systematic design. Undefined sample properties therefore hinder the use of GLAS in global forest sampling. RESULTS: We propose a method of identifying a subset of the GLAS data which can justifiably be treated as a simple random sample in model-based biomass estimation. The relatively uniform spatial distribution and locally arbitrary positioning of the resulting sample is similar to the design used by the US national forest inventory (NFI). We demonstrated model-based estimation using a sample of GLAS data in the US state of California, where our estimate of biomass (211 Mg/hectare) was within the 1.4% standard error of the design-based estimate supplied by the US NFI. The standard error of the GLAS-based estimate was significantly higher than the NFI estimate, although the cost of the GLAS estimate (excluding costs for the satellite itself) was almost nothing, compared to at least US$ 10.5 million for the NFI estimate. CONCLUSIONS: Global application of model-based estimation using GLAS, while demanding significant consolidation of training data, would improve inter-comparability of international biomass estimates by imposing consistent methods and a globally coherent sample frame. The methods presented here constitute a globally extensible approach for generating a simple random sample from the global GLAS dataset, enabling its use in forest inventory activities.
Advances in the development of micro-electromechanical systems (MEMS) have made possible the fabrication of cheap and small dimension accelerometers and gyroscopes, which are being used in many applications where the global positioning system (GPS) and the inertial navigation system (INS) integration is carried out, i.e., identifying track defects, terrestrial and pedestrian navigation, unmanned aerial vehicles (UAVs), stabilization of many platforms, etc. Although these MEMS sensors are low-cost, they present different errors, which degrade the accuracy of the navigation systems in a short period of time. Therefore, a suitable modeling of these errors is necessary in order to minimize them and, consequently, improve the system performance. In this work, the most used techniques currently to analyze the stochastic errors that affect these sensors are shown and compared: we examine in detail the autocorrelation, the Allan variance (AV) and the power spectral density (PSD) techniques. Subsequently, an analysis and modeling of the inertial sensors, which combines autoregressive (AR) filters and wavelet de-noising, is also achieved. Since a low-cost INS (MEMS grade) presents error sources with short-term (high-frequency) and long-term (low-frequency) components, we introduce a method that compensates for these error terms by doing a complete analysis of Allan variance, wavelet de-nosing and the selection of the level of decomposition for a suitable combination between these techniques. Eventually, in order to assess the stochastic models obtained with these techniques, the Extended Kalman Filter (EKF) of a loosely-coupled GPS/INS integration strategy is augmented with different states. Results show a comparison between the proposed method and the traditional sensor error models under GPS signal blockages using real data collected in urban roadways.
The problem of determining the optimal geometric configuration of a sensor network that will maximize the range-related information available for multiple target positioning is of key importance in a multitude of application scenarios. In this paper, a set of sensors that measures the distances between the targets and each of the receivers is considered, assuming that the range measurements are corrupted by white Gaussian noise, in order to search for the formation that maximizes the accuracy of the target estimates. Using tools from estimation theory and convex optimization, the problem is converted into that of maximizing, by proper choice of the sensor positions, a convex combination of the logarithms of the determinants of the Fisher Information Matrices corresponding to each of the targets in order to determine the sensor configuration that yields the minimum possible covariance of any unbiased target estimator. Analytical and numerical solutions are well defined and it is shown that the optimal configuration of the sensors depends explicitly on the constraints imposed on the sensor configuration, the target positions, and the probabilistic distributions that define the prior uncertainty in each of the target positions. Simulation examples illustrate the key results derived.
The Kalman filter has been widely applied in the field of dynamic navigation and positioning. However, its performance will be degraded in the presence of significant model errors and uncertain interferences. In the literature, the fading filter was proposed to control the influences of the model errors, and the H-infinity filter can be adopted to address the uncertainties by minimizing the estimation error in the worst case. In this paper, a new multiple fading factor, suitable for the Global Positioning System (GPS) and the Inertial Navigation System (INS) integrated navigation system, is proposed based on the optimization of the filter, and a comprehensive filtering algorithm is constructed by integrating the advantages of the H-infinity filter and the proposed multiple fading filter. Measurement data of the GPS/INS integrated navigation system are collected under actual conditions. Stability and robustness of the proposed filtering algorithm are tested with various experiments and contrastive analysis are performed with the measurement data. Results demonstrate that both the filter divergence and the influences of outliers are restrained effectively with the proposed filtering algorithm, and precision of the filtering results are improved simultaneously.
Therapeutic substitution offers potential to decrease pharmaceutical expenditures and potentially improve the efficiency of the health care system.
In covariance structure analysis, two-stage least-squares (2SLS) estimation has been recommended for use over maximum likelihood estimation when model misspecification is suspected. However, 2SLS often fails to provide stable and accurate solutions, particularly for structural equation models with small samples. To address this issue, a regularized extension of 2SLS is proposed that integrates a ridge type of regularization into 2SLS, thereby enabling the method to effectively handle the small-sample-size problem. Results are then reported of a Monte Carlo study conducted to evaluate the performance of the proposed method, as compared to its nonregularized counterpart. Finally, an application is presented that demonstrates the empirical usefulness of the proposed method.
Quantifying diversity is of central importance for the study of structure, function and evolution of microbial communities. The estimation of microbial diversity has received renewed attention with the advent of large-scale metagenomic studies. Here, we consider what the diversity observed in a sample tells us about the diversity of the community being sampled. First, we argue that one cannot reliably estimate the absolute and relative number of microbial species present in a community without making unsupported assumptions about species abundance distributions. The reason for this is that sample data do not contain information about the number of rare species in the tail of species abundance distributions. We illustrate the difficulty in comparing species richness estimates by applying Chao’s estimator of species richness to a set of in silico communities: they are ranked incorrectly in the presence of large numbers of rare species. Next, we extend our analysis to a general family of diversity metrics (‘Hill diversities’), and construct lower and upper estimates of diversity values consistent with the sample data. The theory generalizes Chao’s estimator, which we retrieve as the lower estimate of species richness. We show that Shannon and Simpson diversity can be robustly estimated for the in silico communities. We analyze nine metagenomic data sets from a wide range of environments, and show that our findings are relevant for empirically-sampled communities. Hence, we recommend the use of Shannon and Simpson diversity rather than species richness in efforts to quantify and compare microbial diversity.The ISME Journal advance online publication, 14 February 2013; doi:10.1038/ismej.2013.10.
Skew-t Fits to Mortality Data–Can a Gaussian-Related Distribution Replace the Gompertz-Makeham as the Basis for Mortality Studies?
- The journals of gerontology. Series A, Biological sciences and medical sciences
- Published almost 6 years ago
Gompertz-related distributions have dominated mortality studies for 187 years. However, nonrelated distributions also fit well to mortality data. These compete with the Gompertz and Gompertz-Makeham data when applied to data with varying extents of truncation, with no consensus as to preference. In contrast, Gaussian-related distributions are rarely applied, despite the fact that Lexis in 1879 suggested that the normal distribution itself fits well to the right of the mode. Study aims were therefore to compare skew-t fits to Human Mortality Database data, with Gompertz-nested distributions, by implementing maximum likelihood estimation functions (mle2, R package bbmle; coding given). Results showed skew-t fits obtained lower Bayesian information criterion values than Gompertz-nested distributions, applied to low-mortality country data, including 1711 and 1810 cohorts. As Gaussian-related distributions have now been found to have almost universal application to error theory, one conclusion could be that a Gaussian-related distribution might replace Gompertz-related distributions as the basis for mortality studies.
In this paper, we propose a class of multivariate random effects models allowing for the inclusion of study-level covariates to carry out meta-analyses. As existing algorithms for computing maximum likelihood estimates often converge poorly or may not converge at all when the random effects are multi-dimensional, we develop an efficient expectation-maximization algorithm for fitting multi-dimensional random effects regression models. In addition, we also develop a new methodology for carrying out variable selection with study-level covariates. We examine the performance of the proposed methodology via a simulation study. We apply the proposed methodology to analyze metadata from 26 studies involving statins as a monotherapy and in combination with ezetimibe. In particular, we compare the low-density lipoprotein cholesterol-lowering efficacy of monotherapy and combination therapy on two patient populations (naïve and non-naïve patients to statin monotherapy at baseline), controlling for aggregate covariates. The proposed methodology is quite general and can be applied in any meta-analysis setting for a wide range of scientific applications and therefore offers new analytic methods of clinical importance. Copyright © 2012 John Wiley & Sons, Ltd.