SciCombinator

Discover the most talked about and latest scientific content & concepts.

Concept: Data compression

170

A major challenge of current high-throughput sequencing experiments is not only the generation of the sequencing data itself but also their processing, storage and transmission. The enormous size of these data motivates the development of data compression algorithms usable for the implementation of the various storage policies that are applied to the produced intermediate and final result files. In this article, we present NGC, a tool for the compression of mapped short read data stored in the wide-spread SAM format. NGC enables lossless and lossy compression and introduces the following two novel ideas: first, we present a way to reduce the number of required code words by exploiting common features of reads mapped to the same genomic positions; second, we present a highly configurable way for the quantization of per-base quality values, which takes their influence on downstream analyses into account. NGC, evaluated with several real-world data sets, saves 33-66% of disc space using lossless and up to 98% disc space using lossy compression. By applying two popular variant and genotype prediction tools to the decompressed data, we could show that the lossy compression modes preserve >99% of all called variants while outperforming comparable methods in some configurations.

Concepts: Code, Information theory, JPEG, Data compression, Huffman coding, Portable Network Graphics, Lossless data compression, Rate–distortion theory

139

Rumor spreading can have a significant impact on people’s lives, distorting scientific facts and influencing political opinions. With technologies that have democratized the production and reproduction of information, the rate at which misinformation can spread has increased significantly, leading many to describe contemporary times as a ‘post-truth era’. Research into rumor spreading has primarily been based on either model of social and biological contagion, or upon models of opinion dynamics. Here we present a comprehensive model that is based on information entropy, which allows for the incorporation of considerations like the role of memory, conformity effects, differences in the subjective propensity to produce distortions, and variations in the degree of trust that people place in each other. Variations in the degree of trust are controlled by a confidence factor β, while the propensity to produce distortions is controlled by a conservation factor K. Simulations were performed using a Barabási-Albert (BA) scale-free network seeded with a single piece of information. The influence of β and K upon the temporal evolution of the system was subsequently analyzed regarding average information entropy, opinion fragmentation, and the range of rumor spread. These results can aid in decision-making to limit the spread of rumors.

Concepts: Entropy, Information theory, Data compression, Arithmetic coding, Self-information

28

Abstract The purpose of this article was to examine the effect of equipment scaling, through the modification of tennis ball compression, on elite junior tennis players (aged 10 years) within a match-play context. The two types of ball compressions that were compared were the standard compression (the normal ball) and 75% compression (termed the modified ball). Ten boys and 10 girls participated in the study. Participants were stratified into pairs based on their Australian Age Ranking and gender. Each pair played two two-set matches: one match with standard compression balls and one match with modified balls. The characteristics of each match were analysed and compared. The results showed that the use of the modified ball increased rally speed, allowed players to strike the ball at a lower (more comfortable) height on their groundstrokes and increased the number of balls played at the net. Ball compression had no effect on the relative number of winners, forehands, backhands, first serves in and double faults. The results are discussed in relation to skill acquisition for skilled junior tennis players.

Concepts: Skill, Learning, Data compression, Baseball, Tennis, Dreyfus model of skill acquisition, Balls, Junior tennis

28

The exponential growth of high-throughput DNA sequence data has posed great challenges to genomic data storage, retrieval and transmission. Compression is a critical tool to address these challenges, where many methods have been developed to reduce the storage size of the genomes and sequencing data (reads, quality scores and metadata). However, genomic data are being generated faster than they could be meaningfully analyzed, leaving a large scope for developing novel compression algorithms that could directly facilitate data analysis beyond data transfer and storage. In this article, we categorize and provide a comprehensive review of the existing compression methods specialized for genomic data and present experimental results on compression ratio, memory usage, time for compression and decompression. We further present the remaining challenges and potential directions for future research.

Concepts: DNA, Gene, Genome, Computer storage, Data compression, Media technology, Computer data storage, Image compression

28

Discrete cosine transform (DCT) is the orthogonal transform that is most commonly used in image and video compression. The motion-compensation residual (MC-residual) is also compressed with the DCT in most video codecs. However, the MC-residual has different characteristics from a nature image. In this paper, we develop a new orthogonal transform-rotated orthogonal transform (ROT) that can perform better on the MC-residual than the DCT for coding purposes. We derive the proposed ROT based on orthogonal-constrained L1-Norm minimization problem for its sparse property. Using the DCT matrix as the starting point, a better orthogonal transform matrix is derived. In addition, by exploring inter-frame dependency and local motion activity, transmission of substantial side information is avoided. The experiment results confirm that, with small computation overhead, the ROT is adaptive to change of local spatial characteristic of MC-residual frame and provides higher compression efficiency for the MC-residual than DCT, especially for high- and complex-motion videos.

Concepts: Discrete cosine transform, Fourier analysis, Data compression, Video compression, Video codec, Orthogonal matrix, Discrete sine transform, Theora

28

This article presents four modifications to the JPEG arithmetic coding (JAC) algorithm, a topic not studied well before. It then compares the compression performance of the modified JPEG with JPEG XR, the latest, block-based, image coding standard. We first show that the bulk of inter/intra-block redundancy, caused due to use of the block-based approach by JPEG, can be captured by applying efficient prediction coding. We propose the following modifications to JAC to take advantages of our prediction approach. (1) We code a totally different DC difference. (2) JAC tests a DCT coefficient by considering its bits in the increasing order of significance for coding the most significant bit position. It causes plenty of redundancy because JAC always begins with the zeroth bit. We modify this coding order and propose alternations to the JPEG coding procedures. (3) We predict the sign of significant DCT coefficients, a problem not addressed from the perspective of the JPEG decoder before. (4) We reduce the number of binary tests that JAC codes to mark end-of-block. We provide experimental results for two sets of 8-bit gray images. The first set is consists of nine classical test images mostly of size 512 x 512 pixels. The second set is consists of 13 images of size 2000 x 3000 pixels or more. Our modifications to JAC obtain extra-ordinary amount of code reduction without adding any kind of losses. More specifically, when we quantize the images using the default quantizers, our modifications reduce the total JAC code size of the images of these two sets by about 8.9% and 10.6%, and the JPEG Huffman code size by about 16.3% and 23.4%, respectively, on the average. Gains are even higher for coarsely quantized images. Finally, we compare the modified JAC with two settings of JPEG XR, one with no block overlapping and the other with the default transform (we denotes them by JXR0 and JXR1, resp.). Our results show that for the finest quality rate image coding, the modified JAC compresses the large set images by about 5.8% more than JXR1 and by 6.7% more than JXR0, on the average. We provide some rate-distortion plots on lossy coding, which show that the modified JAC distinctly outperforms JXR0 but JXR1 beats us by about similar margin.

Concepts: Discrete cosine transform, Binary numeral system, JPEG, Data compression, Bit, Huffman coding, Least significant bit, Arithmetic coding

27

This paper introduces an efficient method for lossless compression of depth map images, using the representation of a depth image in terms of three entities: the crack-edges, the constant depth regions enclosed by them, and the depth value over each region. The starting representation is identical with that used in a very efficient coder for palette images, the piecewiseconstant image model (PWC) coding, but the techniques used for coding the elements of the representation are more advanced and especially suitable for the type of redundancy present in depth images. First the vertical and horizontal crack-edges separating the constant depth regions are transmitted by two-dimensional context coding using optimally pruned context trees. Both the encoder and decoder can reconstruct the regions of constant depth from the transmitted crack-edge image. The depth value in a given region is encoded by utilizing the depth values of the neighboring regions already encoded, exploiting the natural smoothness of the depth variation and the mutual exclusiveness of the values in neighboring regions. The encoding method is suitable for lossless compression of depth images, obtaining compression of about 10 to 65 times, and additionally can be used as the entropy coding stage for lossy depth compression.

Concepts: Information theory, JPEG, Data compression, Huffman coding, Arithmetic coding, Encoder, Encoding, Graphics Interchange Format

27

Autonomous listening devices are increasingly used to study vocal aquatic animals, and there is a constant need to record longer or with greater bandwidth, requiring efficient use of memory and battery power. Real-time compression of sound has the potential to extend recording durations and bandwidths at the expense of increased processing operations and therefore power consumption. Whereas lossy methods such as MP3 introduce undesirable artifacts, lossless compression algorithms (e.g., flac) guarantee exact data recovery. But these algorithms are relatively complex due to the wide variety of signals they are designed to compress. A simpler lossless algorithm is shown here to provide compression factors of three or more for underwater sound recordings over a range of noise environments. The compressor was evaluated using samples from drifting and animal-borne sound recorders with sampling rates of 16-240 kHz. It achieves >87% of the compression of more-complex methods but requires about 1/10 of the processing operations resulting in less than 1 mW power consumption at a sampling rate of 192 kHz on a low-power microprocessor. The potential to triple recording duration with a minor increase in power consumption and no loss in sound quality may be especially valuable for battery-limited tags and robotic vehicles.

Concepts: Complexity, Information theory, Data compression, Media technology, Gramophone record, Kolmogorov complexity, Lossless data compression, Lossless

25

One of the major applications of wireless sensors networks (WSNs) is vibration measurement for the purpose of structural health monitoring and machinery fault diagnosis. WSNs have many advantages over the wired networks such as low cost and reduced setup time. However, the useful bandwidth is limited, as compared to wired networks, resulting in relatively low sampling. One solution to this problem is data compression which, in addition to enhancing sampling rate, saves valuable power of the wireless nodes. In this work, a data compression scheme, based on Modified Discrete Cosine Transform (MDCT) followed by Embedded Harmonic Components Coding (EHCC) is proposed to compress vibration signals. The EHCC is applied to exploit harmonic redundancy present is most vibration signals resulting in improved compression ratio. This scheme is made suitable for the tiny hardware of wireless nodes and it is proved to be fast and effective. The efficiency of the proposed scheme is investigated by conducting several experimental tests.

Concepts: Signal processing, Information theory, Data compression, Structural engineering, Discrete signal, Wireless sensor network, Structural health monitoring, Modified discrete cosine transform

23

This paper describes a highly efficient method for lossless compression of volumetric sets of medical images, such as CTs or MRIs. The proposed method, referred to as 3D-MRP, is based on the principle of minimum rate predictors (MRP), which is one of the state-of-the-art lossless compression technologies, presented in the data compression literature. The main features of the proposed method include the use of 3D predictors, 3D-block octree partitioning and classification, volume-based optimisation and support for 16 bit-depth images. Experimental results demonstrate the efficiency of the 3D-MRP algorithm for the compression of volumetric sets of medical images, achieving gains above 15% and 12% for 8 bit and 16 bit-depth contents, respectively, when compared to JPEG-LS, JPEG2000, CALIC, HEVC, as well as other proposals based on MRP algorithm.

Concepts: Medical imaging, Information theory, JPEG, Data compression, Portable Network Graphics, Lossless data compression, Graphics Interchange Format