SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Formal sciences

425

Machine Learning (ML) methods have been proposed in the academic literature as alternatives to statistical ones for time series forecasting. Yet, scant evidence is available about their relative performance in terms of accuracy and computational requirements. The purpose of this paper is to evaluate such performance across multiple forecasting horizons using a large subset of 1045 monthly time series used in the M3 Competition. After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined. Moreover, we observed that their computational requirements are considerably greater than those of statistical methods. The paper discusses the results, explains why the accuracy of ML models is below that of statistical ones and proposes some possible ways forward. The empirical results found in our research stress the need for objective and unbiased ways to test the performance of forecasting methods that can be achieved through sizable and open competitions allowing meaningful comparisons and definite conclusions.

Concepts: Scientific method, Statistics, Mathematics, Machine learning, Time series, Biostatistics, Applied mathematics, Formal sciences

0

Protein or nucleic acid sequences contain a multitude of associated annotations representing continuous sequence elements (CSEs). Comparing these CSEs is needed, whenever we want to match identical annotations or integrate distinctive ones. Currently, there is no ready-to-use software available that provides comprehensive statistical readout for comparing two annotations of the same type with each other, which can be adapted to the application logic of the scientific question.

Concepts: Scientific method, Statistics, Mathematics, Time series, Need, Topological space, Limit of a sequence, Formal sciences

0

We address Jacoby’s (1991) proposal that strategic control over knowledge requires conscious awareness of that knowledge. In a two-grammar artificial grammar learning experiment all participants were trained on two grammars, consisting of a regularity in letter sequences, while two other dimensions (colours and fonts) varied randomly. Strategic control was measured as the ability to selectively apply the grammars during classification. For each classification, participants also made a combined judgement of (a) decision strategy and (b) relevant stimulus dimension. Strategic control was found for all types of decision strategy, including trials where participants claimed to lack conscious structural knowledge. However, strong evidence of strategic control only occurred when participants knew or guessed that the letter dimension was relevant, suggesting that strategic control might be associated with - or even causally requires - global awareness of the nature of the rules even though it does not require detailed knowledge of their content.

Concepts: Game theory, Scientific method, Consciousness, Artificial intelligence, Control theory, String theory, Formal sciences

0

Evidence from animal models suggest that t-tubule changes may play an important role in the contractile deficit associated with heart failure. However samples are usually taken at random with no regard as to regional variability present in failing hearts which leads to uncertainty in the relationship between contractile performance and possible t-tubule derangement. Regional contraction in human hearts was measured by tagged cine MRI and model fitting. At transplant, failing hearts were biopsy sampled in identified regions and immunocytochemistry was used to label t-tubules and sarcomeric z-lines. Computer image analysis was used to assess 5 different unbiased measures of t-tubule structure/organization. In regions of failing hearts that showed good contractile performance, t-tubule organization was similar to that seen in normal hearts, with worsening structure correlating with the loss of regional contractile performance. Statistical analysis showed that t-tubule direction was most highly correlated with local contractile performance, followed by the amplitude of the sarcomeric peak in the Fourier transform of the t-tubule image. Other area based measures were less well correlated. We conclude that regional contractile performance in failing human hearts is strongly correlated with the local t-tubule organization. Cluster tree analysis with a functional definition of failing contraction strength allowed a pathological definition of ’t-tubule disease'. The regional variability in contractile performance and cellular structure is a confounding issue for analysis of samples taken from failing human hearts, although this may be overcome with regional analysis by using tagged cMRI and biopsy mapping.

Concepts: Cardiomyopathy, Heart, Fourier transform, Analysis of variance, Fourier analysis, Pattern recognition, Computer vision, Formal sciences

0

Simple bioinformatic tools are frequently used to analyse time-series datasets regardless of their ability to deal with transient phenomena, limiting the meaningful information that may be extracted from them. This situation requires the development and exploitation of tailor-made, easy-to-use, and flexible tools designed specifically for the analysis of time-series datasets.

Concepts: Statistics, Time series, Debut albums, Formal sciences

0

Regime shifts are generally defined as the point of ‘abrupt’ change in the state of a system. However, a seemingly abrupt transition can be the product of a system reorganization that has been ongoing much longer than is evident in statistical analysis of a single component of the system. Using both univariate and multivariate statistical methods, we tested a long-term high-resolution paleoecological dataset with a known change in species assemblage for a regime shift. Analysis of this dataset with Fisher Information and multivariate time series modeling showed that there was a∼2000 year period of instability prior to the regime shift. This period of instability and the subsequent regime shift coincide with regional climate change, indicating that the system is undergoing extrinsic forcing. Paleoecological records offer a unique opportunity to test tools for the detection of thresholds and stable-states, and thus to examine the long-term stability of ecosystems over periods of multiple millennia.

Concepts: Time, Statistics, Actuarial science, Data, Multivariate statistics, Ronald Fisher, Time series, Formal sciences

0

Clathrin-mediated endocytosis (CME) plays a central role in cellular homeostasis and is mediated by clathrin-coated pits (CCPs). Live-cell imaging has revealed a remarkable heterogeneity in CCP assembly kinetics, which can be used as an intrinsic source of mechanistic information on CCP regulation but also poses several major problems for unbiased analysis of CME dynamics. The backbone of unveiling the molecular control of CME is an imaging-based inventory of the full diversity of individual CCP behaviors, which requires detection and tracking of structural fiduciaries and regulatory proteins with an accuracy of >99.9%, despite very low signals. This level of confidence can only be achieved by combining appropriate imaging modalities with self-diagnostic computational algorithms for image analysis and data mining.

Concepts: DNA, Algorithm, Critical thinking, Structure, Regulation, Logic, IMAGE, Formal sciences

0

The behavioural data yielded by single subjects in naturalistic and controlled settings likely contain valuable information to scientists and practitioners alike. Although some of the properties unique to this data complicate statistical analysis, progress has been made in developing specialised techniques for rigorous data evaluation. There are no perfect tests currently available to analyse short autocorrelated data streams, but there are some promising approaches that warrant further development. Although many approaches have been proposed, and some appear better than others, they all have some limitations. When data sets are large enough (∼30 data points per phase), the researcher has a reasonably rich pallet of statistical tools from which to choose. However, when the data set is sparse, the analytical options dwindle. Simulation modelling analysis (SMA; described in this article) is a relatively new technique that appears to offer acceptable Type-I and Type-II error rate control with short streams of autocorrelated data. However, at this point, it is probably too early to endorse any specific statistical approaches for short, autocorrelated time-series data streams. While SMA shows promise, more work is needed to verify that it is capable of reliable Type-I and Type-II error performance with short serially dependent streams of data.

Concepts: Regression analysis, Statistics, Data, Data set, Time series, Baseball statistics, Time series analysis, Formal sciences

0

The objective of change-point detection is to discover abrupt property changes lying behind time-series data. In this paper, we present a novel statistical change-point detection algorithm based on non-parametric divergence estimation between time-series samples from two retrospective segments. Our method uses the relative Pearson divergence as a divergence measure, and it is accurately and efficiently estimated by a method of direct density-ratio estimation. Through experiments on artificial and real-world datasets including human-activity sensing, speech, and Twitter messages, we demonstrate the usefulness of the proposed method.

Concepts: Statistics, Mathematics, Data set, Spectral density, Estimation, Time series, Twitter, Formal sciences