SciCombinator

Discover the most talked about and latest scientific content & concepts.

Journal: IEEE transactions on pattern analysis and machine intelligence

28

We consider the problem of categorizing video sequences of dynamic textures, i.e., nonrigid dynamical objects such as fire, water, steam, flags, etc. This problem is extremely challenging because the shape and appearance of a dynamic texture continuously change as a function of time. State-of-the-art dynamic texture categorization methods have been successful at classifying videos taken from the same viewpoint and scale by using a Linear Dynamical System (LDS) to model each video, and using distances or kernels in the space of LDSs to classify the videos. However, these methods perform poorly when the video sequences are taken under a different viewpoint or scale. In this paper, we propose a novel dynamic texture categorization framework that can handle such changes. We model each video sequence with a collection of LDSs, each one describing a small spatiotemporal patch extracted from the video. This Bag-of-Systems (BoS) representation is analogous to the Bag-of-Features (BoF) representation for object recognition, except that we use LDSs as feature descriptors. This choice poses several technical challenges in adopting the traditional BoF approach. Most notably, the space of LDSs is not euclidean; hence, novel methods for clustering LDSs and computing codewords of LDSs need to be developed. We propose a framework that makes use of nonlinear dimensionality reduction and clustering techniques combined with the Martin distance for LDSs to tackle these issues. Our experiments compare the proposed BoS approach to existing dynamic texture categorization methods and show that it can be used for recognizing dynamic textures in challenging scenarios which could not be handled by existing methods.

Concepts: Mathematics, Logistic map, Dimension, Distance, Dynamical system, Dynamical systems, Dynamical systems theory

28

The objective of this work is recognition and spatiotemporal localization of two-person interactions in video. Our approach is person-centric. As a first stage we track all upper bodies and heads in a video using a tracking-by-detection approach that combines detections with KLT tracking and clique partitioning, together with occlusion detection, to yield robust person tracks. We develop local descriptors of activity based on the head orientation (estimated using a set of pose-specific classifiers) and the local spatiotemporal region around them, together with global descriptors that encode the relative positions of people as a function of interaction type. Learning and inference on the model uses a structured output SVM which combines the local and global descriptors in a principled manner. Inference using the model yields information about which pairs of people are interacting, their interaction class, and their head orientation (which is also treated as a variable, enabling mistakes in the classifier to be corrected using global context). We show that inference can be carried out with polynomial complexity in the number of people, and describe an efficient algorithm for this. The method is evaluated on a new dataset comprising 300 video clips acquired from 23 different TV shows and on the benchmark UT–Interaction dataset.

Concepts: Interaction, Emergence

27

In this paper, we propose an efficient algorithm to detect dense subgraphs of a weighted graph. The proposed algorithm, called Shrinking and Expansion Algorithm (SEA), iterates between two phases, namely, expansion phase and shrink phase, until convergence. For a current subgraph, the expansion phase adds the most related vertices based on the average affinity between each vertex and the subgraph. The shrink phase considers all pairwise relations in the current subgraph and filters out vertices whose average affinities to other vertices are smaller than the average affinity of the result subgraph. In both phases, SEA operates on small subgraphs, thus it is very efficient. Significant dense subgraphs are robustly enumerated by running SEA from each vertex of the graph. We evaluate SEA on two different applications: solving correspondence problems and cluster analysis. Both theoretic analysis and experimental results show that SEA is very efficient and robust, especially when there exist large amount of noises in edge weights.

Concepts: Cluster analysis, Phase, Graph theory, The Current, Proposal, Discrete mathematics, Glossary of graph theory, Vertex

27

Medical image analysis remains a challenging application area for artificial intelligence. When applying machine learning, obtaining ground-truth labels for supervised learning is more difficult than in many more common applications of machine learning. This is especially so for datasets with abnormalities, as tissue types and the shapes of the organs in these datasets differ widely. However, organ detection in such an abnormal dataset may have many promising potential real world applications such as automatic diagnosis, automated radiotherapy planning, and medical image retrieval, where new multi-modal medical images provide more information about the imaged tissues for diagnosis. Here we test the application of deep learning methods to organ identification in magnetic resonance medical images, with visual and temporal hierarchical features learnt to categorise object classes from an unlabelled multi-modal DCE-MRI dataset, so that only a weakly supervised training is required for a classifier. A probabilistic patch-based method was employed for multiple organ detection, with the features learnt from the deep learning model. This shows the potential of the deep learning model for application to medical images, despite the difficulty of obtaining libraries of correctly labelled training datasets, and despite the intrinsic abnormalities present in patient datasets.

Concepts: X-ray, Medical imaging, Artificial intelligence, Machine learning, Learning, Artificial neural network, IMAGE, Supervised learning

25

In recent years, sparse coding has been widely used in many applications ranging from image processing to pattern recognition. Most existing sparse coding based applications require solving a class of challenging non-smooth and non-convex optimization problems. Despite the fact that many numerical methods have been developed for solving these problems, it remains an open problem to find a numerical method which is not only empirically fast, but also has mathematically guaranteed strong convergence. In this paper, we propose an alternating iteration scheme for solving such problems. A rigorous convergence analysis shows that the proposed method satisfies the global convergence property: the whole sequence of iterates is convergent and converges to a critical point. Besides the theoretical soundness, the practical benefit of the proposed method is validated in applications including image restoration and recognition. Experiments show that the proposed method achieves similar results with less computation when compared to widely used methods such as K-SVD.

Concepts: Scientific method, Mathematics, Science, Problem solving, Optimization, Numerical analysis, Open problem, Convergence

25

Attributes act as intermediate representations that enable parameter sharing between classes, a must when training data is scarce.We propose to view attribute-based image classification as a label-embedding problem: each class is embedded in the space of attribute vectors. We introduce a function that measures the compatibility between an image and a label embedding. The parameters of this function are learned on a training set of labeled samples to ensure that, given an image, the correct classes rank higher than the incorrect ones. Results on the Animals With Attributes and Caltech-UCSD-Birds datasets show that the proposed framework outperforms the standard Direct Attribute Prediction baseline in a zero-shot learning scenario. Label embedding enjoys a built-in ability to leverage alternative sources of information instead of or in addition to attributes, such as e.g. class hierarchies or textual descriptions. Moreover, label embedding encompasses the whole range of learning settings from zero-shot learning to regular learning with a large number of labeled examples.

Concepts: Skill, Machine learning, C, Parameter, Class, Subroutine, Currying, C++

24

Hashing has attracted a great deal of research in recent years due to its effectiveness for the retrieval and indexing of large-scale high-dimensional multimedia data. In this paper, we propose a novel ranking-based hashing framework that maps data from different modalities into a common Hamming space where the cross-modal similarity can be measured using Hamming distance. Unlike existing cross-modal hashing algorithms where the learned hash functions are binary space partitioning functions, such as the sign and threshold function, the proposed hashing scheme takes advantage of a new class of hash functions closely related to rank correlation measures which are known to be scale-invariant, numerically stable, and highly nonlinear. Specifically, we jointly learn two groups of linear subspaces, one for each modality, so that features' ranking orders in different linear subspaces maximally preserve the cross-modal similarities. We show that the ranking-based hash function has a natural probabilistic approximation which transforms the original highly discontinuous optimization problem into one that can be efficiently solved using simple gradient descent algorithms. The proposed hashing framework is also flexible in the sense that the optimization procedures are not tied up to any specific form of loss function, which is typical for existing cross-modal hashing methods, but rather we can flexibly accommodate different loss functions with minimal changes to the learning steps. We demonstrate through extensive experiments on four widely-used real-world multimodal datasets that the proposed cross-modal hashing method can achieve competitive performance against several state-of-the-arts with only moderate training and testing time.

Concepts: Optimization, Numerical analysis, Gradient descent, Modality, Loss function, Hash function, Cryptographic hash function, Linear subspace

24

Many problems in computer vision can be formulated as multidimensional ellipsoid-specific fitting, which is to minimize the residual error such that the underlying quadratic surface is a multidimensional ellipsoid. In this paper, we present a fast and robust algorithm for solving ellipsoid-specific fitting directly. Our method is based on the alternating direction method of multipliers, which does not introduce extra positive semi-definiteness constraints. The computation complexity is thus significantly lower than those of semi-definite programming (SDP) based methods. More specifically, to fit n data points into a p dimensional ellipsoid, our complexity is O(p(6) + np(4))+O(p(3)) , where the former O results from preprocessing data once, while that of the state-of-the-art SDP method is [Formula: see text] for each iteration. The storage complexity of our algorithm is about [Formula: see text], which is at most ¼ of those of SDP methods. Extensive experiments testify to the great speed and accuracy advantages of our method over the state-of-the-art approaches. The implementation of our method is also much simpler than SDP based methods.

Concepts: Machine learning, Computer, Real number, Problem solving, Optimization, Computer programming, Errors and residuals in statistics, Quadric

24

This paper studies the estimation of Dirichlet process mixtures over discrete incomplete rankings. The generative model for each mixture component is the generalized Mallows (GM) model, an exponential family model for permutations which extends seamlessly to top-t rankings. While the GM is remarkably tractable in comparison with other permutation models, its conjugate prior is not. Our main contribution is to derive the theory and algorithms for sampling from the desired posterior distributions under this DPM. We introduce a family of partially collapsed Gibbs samplers, containing as one extreme point an exact algorithm based on slice-sampling, and at the other a fast approximate sampler with superior mixing that is still very accurate in all but the lowest ranks. We empirically demonstrate the effectiveness of the approximation in reducing mixing time, the benefits of the Dirichlet process approach over alternative clustering techniques, and the applicability of the approach to exploring large real-world ranking datasets.

Concepts: Mixture, Non-parametric statistics, Estimation, Ranking, Bayes' theorem, Mix, Conjugate prior, Exponential family

24

Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure, called Hypotheses-CNN-Pooling (HCP), where an arbitrary number of object segment hypotheses are taken as the inputs, then a shared CNN is connected with each hypothesis, and finally the CNN output results from different hypotheses are aggregated with max pooling to produce the ultimate multi-label predictions. Some unique characteristics of this flexible deep CNN infrastructure include: 1) no ground-truth bounding box information is required for training; 2) the whole HCP infrastructure is robust to possibly noisy and/or redundant hypotheses; 3) the shared CNN is flexible and can be well pre-trained with a large-scale single-label image dataset, e.g., ImageNet; and 4) it may naturally output multi-label prediction results. Experimental results on Pascal VOC 2007 and VOC 2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts. In particular, the mAP reaches 90.5% by HCP only and 93.2% after the fusion with our complementary result in [44] based on hand-crafted features on the VOC 2012 dataset.

Concepts: Scientific method, Mathematics, Prediction, Experiment, Hypothesis, Thought experiment, Open problem, Open problems