Massively parallel high throughput sequencing technologies allow us to interrogate the microbial composition of biological samples at unprecedented resolution. The typical approach is to perform high-throughout sequencing of 16S rRNA genes, which are then taxonomically classified based on similarity to known sequences in existing databases. Current technologies cause a predicament though, because although they enable deep coverage of samples, they are limited in the length of sequence they can produce. As a result, high-throughout studies of microbial communities often do not sequence the entire 16S rRNA gene. The challenge is to obtain reliable representation of bacterial communities through taxonomic classification of short 16S rRNA gene sequences. In this study we explored properties of different study designs and developed specific recommendations for effective use of short-read sequencing technologies for the purpose of interrogating bacterial communities, with a focus on classification using naïve Bayesian classifiers. To assess precision and coverage of each design, we used a collection of ∼8,500 manually curated 16S rRNA gene sequences from cultured bacteria and a set of over one million bacterial 16S rRNA gene sequences retrieved from environmental samples, respectively. We also tested different configurations of taxonomic classification approaches using short read sequencing data, and provide recommendations for optimal choice of the relevant parameters. We conclude that with a judicious selection of the sequenced region and the corresponding choice of a suitable training set for taxonomic classification, it is possible to explore bacterial communities at great depth using current technologies, with only a minimal loss of taxonomic resolution.
Multilocus sequence typing (MLST) is a widely used system for typing microorganisms by sequence analysis of housekeeping genes. The main advantage of MLST in comparison to other typing techniques is the unambiguity and transferability of sequence data. However, a main disadvantage is the high cost of DNA sequencing. Here we introduce a high-throughput MLST (HiMLST) method that employs next-generation sequencing (NGS) technology (Roche 454), to generate large quantities of high-quality MLST data at low costs. The HiMLST protocol consists of two steps. In the first step MLST target genes are amplified by PCR in multi-well plates. During this PCR the amplicons of each bacterial isolate are provided with a unique DNA barcode, the multiplex identifier (MID). In the second step all amplicons are pooled and sequenced in a single NGS-run. The MLST profile of each individual isolate can be retrieved easily using its unique MID. With HiMLST we have profiled 575 isolates of Legionella pneumophila, Staphylococcus aureus, Pseudomonas aeruginosa and Streptococcus pneumoniae in mixed species HiMLST experiments. In conclusion, the introduction of HiMLST paves the way for a broad employment of the MLST as a high-quality and cost-effective method for typing microbial species.
Highly parallel SNP genotyping platforms have been developed for some important crop species, but these platforms typically carry a high cost per sample for first-time or small-scale users. In contrast, recently developed genotyping by sequencing (GBS) approaches offer a highly cost effective alternative for simultaneous SNP discovery and genotyping. In the present investigation, we have explored the use of GBS in soybean. In addition to developing a novel analysis pipeline to call SNPs and indels from the resulting sequence reads, we have devised a modified library preparation protocol to alter the degree of complexity reduction. We used a set of eight diverse soybean genotypes to conduct a pilot scale test of the protocol and pipeline. Using ApeKI for GBS library preparation and sequencing on an Illumina GAIIx machine, we obtained 5.5 M reads and these were processed using our pipeline. A total of 10,120 high quality SNPs were obtained and the distribution of these SNPs mirrored closely the distribution of gene-rich regions in the soybean genome. A total of 39.5% of the SNPs were present in genic regions and 52.5% of these were located in the coding sequence. Validation of over 400 genotypes at a set of randomly selected SNPs using Sanger sequencing showed a 98% success rate. We then explored the use of selective primers to achieve a greater complexity reduction during GBS library preparation. The number of SNP calls could be increased by almost 40% and their depth of coverage was more than doubled, thus opening the door to an increase in the throughput and a significant decrease in the per sample cost. The approach to obtain high quality SNPs developed here will be helpful for marker assisted genomics as well as assessment of available genetic resources for effective utilisation in a wide number of species.
BACKGROUND: PCR amplification and high-throughput sequencing theoretically enable the characterization of the finest-scale diversity in natural microbial and viral populations, but each of these methods introduces random errors that are difficult to distinguish from genuine biological diversity. Several approaches have been proposed to denoise these data but lack either speed or accuracy. RESULTS: We introduce a new denoising algorithm that we call DADA (Divisive Amplicon Denoising Algorithm). Without training data, DADA infers both the sample genotypes and error parameters that produced a metagenome data set. We demonstrate performance on control data sequenced on Roche’s 454 platform, and compare the results to the most accurate denoising software currently available, AmpliconNoise. CONCLUSIONS: DADA is more accurate and over an order of magnitude faster than AmpliconNoise. It eliminates the need for training data to establish error parameters, fully utilizes sequence-abundance information, and enables inclusion of context-dependent PCR error rates. It should be readily extensible to other sequencing platforms such as Illumina.
Analysis of microbial communities by high-throughput pyrosequencing of SSU rRNA gene PCR amplicons has transformed microbial ecology research and led to the observation that many communities contain a diverse assortment of rare taxa-a phenomenon termed the Rare Biosphere. Multiple studies have investigated the effect of pyrosequencing read quality on operational taxonomic unit (OTU) richness for contrived communities, yet there is limited information on the fidelity of community structure estimates obtained through this approach. Given that PCR biases are widely recognized, and further unknown biases may arise from the sequencing process itself, a priori assumptions about the neutrality of the data generation process are at best unvalidated. Furthermore, post-sequencing quality control algorithms have not been explicitly evaluated for the accuracy of recovered representative sequences and its impact on downstream analyses, reducing useful discussion on pyrosequencing reads to their diversity and abundances. Here we report on community structures and sequences recovered for in vitro-simulated communities consisting of twenty 16S rRNA gene clones tiered at known proportions. PCR amplicon libraries of the V3-V4 and V6 hypervariable regions from the in vitro-simulated communities were sequenced using the Roche 454 GS FLX Titanium platform. Commonly used quality control protocols resulted in the formation of OTUs with >1% abundance composed entirely of erroneous sequences, while over-aggressive clustering approaches obfuscated real, expected OTUs. The pyrosequencing process itself did not appear to impose significant biases on overall community structure estimates, although the detection limit for rare taxa may be affected by PCR amplicon size and quality control approach employed. Meanwhile, PCR biases associated with the initial amplicon generation may impose greater distortions in the observed community structure.
Fully drought-resistant crop plants would be beneficial, but selection breeding has not produced them. Genetic modification of species by introduction of very many genes is claimed, predominantly, to have given drought resistance. This review analyses the physiological responses of genetically modified (GM) plants to water deficits, the mechanisms, and the consequences. The GM literature neglects physiology and is unspecific in definitions, which are considered here, together with methods of assessment and the type of drought resistance resulting. Experiments in soil with cessation of watering demonstrate drought resistance in GM plants as later stress development than in wild-type (WT) plants. This is caused by slower total water loss from the GM plants which have (or may have-morphology is often poorly defined) smaller total leaf area (LA) and/or decreased stomatal conductance (g(s)), associated with thicker laminae (denser mesophyll and smaller cells). Non-linear soil water characteristics result in extreme stress symptoms in WT before GM plants. Then, WT and GM plants are rewatered: faster and better recovery of GM plants is taken to show their greater drought resistance. Mechanisms targeted in genetic modification are then, incorrectly, considered responsible for the drought resistance. However, this is not valid as the initial conditions in WT and GM plants are not comparable. GM plants exhibit a form of ‘drought resistance’ for which the term ‘delayed stress onset’ is introduced. Claims that specific alterations to metabolism give drought resistance [for which the term ‘constitutive metabolic dehydration tolerance’ (CMDT) is suggested] are not critically demonstrated, and experimental tests are suggested. Small LA and g(s) may not decrease productivity in well-watered plants under laboratory conditions but may in the field. Optimization of GM traits to environment has not been analysed critically and is required in field trials, for example of recently released oilseed rape and maize which show ‘drought resistance’, probably due to delayed stress onset. Current evidence is that GM plants may not be better able to cope with drought than selection-bred cultivars.
In microbial ecology, a fundamental question relates to how community diversity and composition change in response to perturbation. Most studies have had limited ability to deeply sample community structure (e.g. Sanger-sequenced 16S rRNA libraries), or have had limited taxonomic resolution (e.g. studies based on 16S rRNA hypervariable region sequencing). Here, we combine the higher taxonomic resolution of near-full-length 16S rRNA gene amplicons with the economics and sensitivity of short-read sequencing to assay the abundance and identity of organisms that represent as little as 0.01% of sediment bacterial communities. We used a new version of EMIRGE optimized for large data size to reconstruct near-full-length 16S rRNA genes from amplicons sheared and sequenced with Illumina technology. The approach allowed us to differentiate the community composition among samples acquired before perturbation, after acetate amendment shifted the predominant metabolism to iron reduction, and once sulfate reduction began. Results were highly reproducible across technical replicates, and identified specific taxa that responded to the perturbation. All samples contain very high alpha diversity and abundant organisms from phyla without cultivated representatives. Surprisingly, at the time points measured, there was no strong loss of evenness, despite the selective pressure of acetate amendment and change in the terminal electron accepting process. However, community membership was altered significantly. The method allows for sensitive, accurate profiling of the “long tail” of low abundance organisms that exist in many microbial communities, and can resolve population dynamics in response to environmental change.
BACKGROUND: Many proteins form insoluble protein aggregates, called “inclusion bodies”, when overexpressed in E. coli. This is the biggest obstacle in biotechnology. Ever since the reversible denaturation of proteins by chaotropic agents such as urea or guanidinium hydrochloride had been shown, these compounds were predominantly used to dissolve inclusion bodies. Other denaturants exist but have received much less attention in protein purification. While the anionic, denaturing detergent sodiumdodecylsulphate (SDS) is used extensively in analytical SDS-PAGE, it has rarely been used in preparative purification. RESULTS: Here we present a simple and versatile method to purify insoluble, hexahistidine-tagged proteins under denaturing conditions. It is based on dissolution of overexpressing bacterial cells in a buffer containing sodiumdodecylsulfate (SDS) and whole-lysate denaturation of proteins. The excess of detergent is removed by cooling and centrifugation prior to affinity purification. Host- and overexpressed proteins do not co-precipitate with SDS and the residual concentration of detergent is compatible with affinity purification on Ni/NTA resin. We show that SDS can be replaced with another ionic detergent, Sarkosyl, during purification. Key advantages over denaturing purification in urea or guanidinium are speed, ease of use, low cost of denaturant and the compatibility of buffers with automated FPLC. CONCLUSION: Ionic, denaturing detergents are useful in breaking the solubility barrier, a major obstacle in biotechnology. The method we present yields detergent-denatured protein. Methods to refold proteins from a detergent denatured state are known and therefore we propose that the procedure presented herein will be of general application in biotechnology.
A sharp decline in the availability of arable land and sufficient supply of irrigation water along with a continuous steep increase in food demands have exerted a pressure on farmers to produce more with fewer resources. A viable solution to release this pressure is to speed up the plant breeding process by employing biotechnology in breeding programs. The majority of biotechnological applications rely on information generated from various -omic technologies. The latest outstanding improvements in proteomic platforms and many other but related advances in plant biotechnology techniques offer various new ways to encourage the usage of these technologies by plant scientists for crop improvement programs. A combinatorial approach of accelerated gene discovery through genomics, proteomics, and other associated -omic branches of biotechnology, as an applied approach, is proving to be an effective way to speed up the crop improvement programs worldwide. In the near future, swift improvements in -omic databases are becoming critical and demand immediate attention for the effective utilization of these techniques to produce next-generation crops for the progressive farmers. Here, we have reviewed the recent advances in proteomics, as tools of biotechnology, which are offering great promise and leading the path toward crop improvement for sustainable agriculture.
Effect of low-frequency KRAS mutations on the response to anti-EGFR therapy in metastatic colorectal cancer
- Annals of oncology : official journal of the European Society for Medical Oncology / ESMO
- Published about 8 years ago
BackgroundOnly patients with wild-type (WT) KRAS tumors benefit from anti-epidermal growth factor receptor (EGFR) monoclonal antibodies (Mabs) in metastatic colorectal cancer (mCRC). Pyrosequencing is now widely used for the determination of KRAS mutation burden and a conservative cut-off point of 10% has been defined. Up until now, the impact of low-frequency KRAS mutations (<10%) on the response to anti-EGFR Mabs has yet to be evaluated.Patients and methodsTumors from patients receiving anti-EGFR Mabs based on a WT genotype for KRAS, as determined using direct sequencing, have been retrospectively analyzed by pyrosequencing. Patients were categorized as WT (no KRAS mutation) or low-frequency mutation when KRAS mutation was <10% (KRAS low MT).ResultsA total of 168 patients treated by anti-EGFR Mabs for mCRC were analyzed. According to pyrosequencing, 138 tumors remained KRAS WT, while 30 tumors were KRAS low MT. In the KRAS low MT and KRAS WT groups, the response rates were 6.7% and 37.0%, respectively, while stabilization amounted to 23.3% versus 32.6% and progression to 70% versus 29% (P < 0.01). Progression-free survival (PFS) was 2.7 ± 0.5 months for KRAS low MT and was 6.0 ± 0.3 months for KRAS WT (P < 0.01).ConclusionsThese results appear to validate consideration of low-frequency KRAS mutation tumors as positive, and justify a large-scale prospective study.