Concept: Genealogical DNA test
DNA testing is an established part of the investigation and prosecution of sexual assault. The primary purpose of DNA evidence is to identify a suspect and/or to demonstrate sexual contact. However, due to highly uneven proportions of female and male DNA in typical stains, routine autosomal analysis often fails to detect the DNA of the assailant. To evaluate the forensic efficiency of the combined application of autosomal and Y-chromosomal short tandem repeat (STR) markers, we present a large retrospective casework study of probative evidence collected in sexual-assault cases. We investigated up to 39 STR markers by testing combinations of the 16-locus NGMSElect kit with both the 23-locus PowerPlex Y23 and the 17-locus Yfiler kit. Using this dual approach we analyzed DNA extracts from 2077 biological stains collected in 287 cases over 30 months. To assess the outcome of the combined approach in comparison to stand-alone autosomal analysis we evaluated informative DNA profiles. Our investigation revealed that Y-STR analysis added up to 21% additional, highly informative (complete, single-source) profiles to the set of reportable autosomal STR profiles for typical stains collected in sexual-assault cases. Detection of multiple male contributors was approximately three times more likely with Y-chromosomal profiling than with autosomal STR profiling. In summary, 1/10 cases would have remained inconclusive (and could have been dismissed) if Y-STR analysis had been omitted from DNA profiling in sexual-assault cases.
Sharing sequencing data sets without identifiers has become a common practice in genomics. Here, we report that surnames can be recovered from personal genomes by profiling short tandem repeats on the Y chromosome (Y-STRs) and querying recreational genetic genealogy databases. We show that a combination of a surname with other types of metadata, such as age and state, can be used to triangulate the identity of the target. A key feature of this technique is that it entirely relies on free, publicly accessible Internet resources. We quantitatively analyze the probability of identification for U.S. males. We further demonstrate the feasibility of this technique by tracing back with high probability the identities of multiple participants in public sequencing projects.
High-frequency microsatellite haplotypes of the male-specific Y-chromosome can signal past episodes of high reproductive success of particular men and their patrilineal descendants. Previously, two examples of such successful Y-lineages have been described in Asia, both associated with Altaic-speaking pastoral nomadic societies, and putatively linked to dynasties descending, respectively, from Genghis Khan and Giocangga. Here we surveyed a total of 5321 Y-chromosomes from 127 Asian populations, including novel Y-SNP and microsatellite data on 461 Central Asian males, to ask whether additional lineage expansions could be identified. Based on the most frequent eight-microsatellite haplotypes, we objectively defined 11 descent clusters (DCs), each within a specific haplogroup, that represent likely past instances of high male reproductive success, including the two previously identified cases. Analysis of the geographical patterns and ages of these DCs and their associated cultural characteristics showed that the most successful lineages are found both among sedentary agriculturalists and pastoral nomads, and expanded between 2100 BCE and 1100 CE. However, those with recent origins in the historical period are almost exclusively found in Altaic-speaking pastoral nomadic populations, which may reflect a shift in political organisation in pastoralist economies and a greater ease of transmission of Y-chromosomes through time and space facilitated by the use of horses.
To determine the genetic diversity and paternal origin of Chinese cattle, 302 males from 16 Chinese native cattle breeds as well as 30 Holstein males and four Burma males as controls were analysed using four Y-SNPs and two Y-STRs. In Chinese bulls, the taurine Y1 and Y2 haplogroups and indicine Y3 haplogroup were detected in seven, 172 and 123 individuals respectively, and these frequencies varied among the Chinese cattle breeds examined. Y2 dominates in northern China (91.4%), and Y3 dominates in southern China (90.8%). Central China is an admixture zone, although Y2 predominates overall (72.0%). The geographical distributions of the Y2 and Y3 haplogroup frequencies revealed a pattern of male indicine introgression from south to north China. The three Y haplogroups were further classified into one Y1 haplotype, five Y2 haplotypes and one Y3 haplotype in Chinese native bulls. Due to the interplay between taurine and indicine types, Chinese cattle represent an extensive reservoir of genetic diversity. The Y haplotype distribution of Chinese cattle exhibited a clear geographical structure, which is consistent with mtDNA, historical and geographical information.
In this paper we consider a problem from hematopoietic cell transplant (HCT) studies where there is interest on assessing the effect of haplotype match for donor and patient on the cumulative incidence function for a right censored competing risks data. For the HCT study, donor’s and patient’s genotype are fully observed and matched but their haplotypes are missing. In this paper we describe how to deal with missing covariates of each individual for competing risks data. We suggest a procedure for estimating the cumulative incidence functions for a flexible class of regression models when there are missing data, and establish the large sample properties. Small sample properties are investigated using simulations in a setting that mimics the motivating haplotype matching problem. The proposed approach is then applied to the HCT study.
Aim: The purpose of this study was to characterize Y-chromosome diversity in Tajiks from Tajikistan and in Persians and Kurds from Iran. Method: Y-chromosome haplotypes were identified in 40 Tajiks, 77 Persians and 25 Kurds, using 12 short tandem repeats (STR) and 18 binary markers. Results: High genetic diversity was observed in the populations studied. Six of 12 haplogroups were common in Persians, Kurds and Tajiks, but only three haplogroups (G-M201, J-12f2 and L-M20) were the most frequent in all populations, comprising together ∼ 60% of the Y-chromosomes in the pooled data set. Analysis of genetic distances between Y-STR haplotypes revealed that the Kurds showed a great distance to the Iranian-speaking populations of Iran, Afghanistan and Tajikistan. The presence of Indian-specific haplogroups L-M20, H1-M52 and R2a-M124 in both Tajik samples from Afghanistan and Tajikistan demonstrates an apparent genetic affinity between Tajiks from these two regions. Conclusions: Despite the marked similarities between Y-chromosome gene pools of Iranian-speaking populations, there are differences between them, defined by many factors, including geographic and linguistic relationships.
Y chromosome single nucleotide polymorphisms (Y-SNPs) are indispensable markers for haplogroup determination. Since Y chromosome haplogroups show a high specific geographical distribution, they play a major role in population genetics but can also benefit forensic investigations. Although haplogroup prediction methods based on Y chromosome short tandem repeats (Y-STRs) exist and are frequently used, precaution is required in this regard. In this study we determine the Y chromosome haplogroups of a Nicaraguan population using several Y-SNP multiplex reactions. Y chromosome haplogroups have been predicted before, but our results show that a confirmation with Y-SNP typings is necessary. These results have revealed a 4.8% of error in haplogroup prediction based on Y-STR haplotypes using Athey’s Haplogroup Predictor. The Nicaraguan Mestizo population displays a majority of Eurasian lineages, mainly represented by haplogroup R-M207 (46.7%). Other Eurasian lineages have been observed, especially J-P209 (13.3%), followed by I-M170 (3.6%) and G-M201 (1.8%). Haplogroup E-P170 was also observed in 15.2% of the sample, particularly subhaplogroup E1b1b1-M35. Finally, the Native American haplogroup Q-M242 was found in 15.2% of the sample, with Q1a3a-M3 being the most frequent.
The aim of this study was to explore whether prostaglandin D(2) receptor (PTGDR) polymorphisms confer susceptibility to asthma. A meta-analysis was conducted on the associations between the PTGDR -549 C/T, -441 C/T, and -197 C/T polymorphisms and asthma using: (1) allele contrast, (2) the recessive model, (3) the dominant model, and (4) the additive model. Three polymorphism haplotypes were constructed in the order -549/-441/-179. Meta-analysis was performed on the haplotype CCC (high transcriptional activity) and of TCT (low transcriptional activity). A total of 13 separate comparative studies in 9 articles involving 7,155 patients with asthma and 7,285 control subjects were included in this meta-analysis. An association between asthma and the PTGDR -549 C/T polymorphism was found by allele contrast (OR = 1.133, 95 % CI = 1.004-1.279, P = 0.043). Ethnicity-specific meta-analysis showed an association between asthma and the PTGDR -549 C allele in Europeans (OR = 1.192, 95 % CI = 1.032-1.377, P = 0.017). Furthermore, stratifying subjects by age indicated an association between the PTGDR -549 C allele and asthma in adults (OR = 1.248, 95 % CI = 1.076-1.447, P = 0.003), but no association in children (OR = 0.933, 95 % CI = 0.756-1.154, P = 0.324). Analyses using the dominant and additive models showed the similar pattern as that observed for the PTGDR -549 C allele, that is, a significant association in Europeans and adults, but not in children. No association was found between asthma and the PTGDR -441 C/T or -197 C/T polymorphisms, and meta-analysis stratified by ethnicity and age also revealed no association between asthma and these polymorphisms. Furthermore, no association was found between asthma and the CCC and TCT haplotypes of PTGDR, and meta-analysis stratified by ethnicity and age revealed no association between asthma and the CCC and TCT PTGDR haplotypes. This meta-analysis demonstrates that the PTGDR -549 C/T polymorphism confers susceptibility to asthma in Europeans and adults. However, no association was found between the PTGDR 441 C/T and -197 C/T polymorphisms or the CCC and TCT haplotypes and asthma susceptibility.
Supplementary short tandem repeats (STRs) can be added to forensic DNA analyses when core markers fail to provide sufficient discrimination power in identity and relationship testing. We combined D6S1043 and Penta B with Promega’s PowerPlex CS7 supplementary STR kit, comprising Pentas D and E plus LPL, F13A01, FES/FPS, F13B, and Penta C. The nine STRs were typed in 941 individuals from 51 diverse populations of the CEPH Human Genome Diversity Panel (HGDP-CEPH), and we report allele frequency estimates plus rare alleles identified. Both Penta B and D6S1043 show highly informative variation in all populations, exceeding most CS7 STRs and raising cumulative random match probabilities by at least two orders of magnitude. However, Penta B genotype distributions show an excess of homozygotes across all HGDP-CEPH population groups indicating likely allele dropout from uncharted SNP or Indel variation at the primer sites chosen to type this STR. The first sequence analysis of common regular and rare intermediate D6S1043 alleles is reported. D6S1043 .3 intermediate alleles were found to occur at a high frequency in Native Americans, providing scope for differentiation of this group.
Fifteen autosomal short tandem repeat (STR) markers [D8S1179, D21S11, D7S820, CSF1PO, D3S1358, THO1, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818 and FGA] were analyzed in 501 unrelated, randomly selected Turkish Cypriot individuals from the island of Cyprus. While no locus duplications or null alleles were detected in these samples, eight allelic variants were observed in total, 75% of which were intermediate allelic variants that were absent in the system allelic ladder. Allelic frequencies and statistical parameters of forensic interest were calculated at each locus. For the 15 STR loci tested, combined matching probability (pM) was 2.15717×10(-18) and combined power of exclusion (PE) was 0.9999995213. No deviations from the Hardy-Weinberg equilibrium were observed, except for the vWA locus, which became insignificant after the Bonferroni correction for multiple testing. Locus-by-locus comparisons of the Turkish Cypriot allelic frequencies with those published for the neighboring and/or historically related populations with similar loci coverage (Turkish, Greek, Greek Cypriot, Italian and Lebanese) revealed some statistically significant differences at one to five loci. In general, an increase in the number of such significant differences between the Turkish Cypriot data and those for other populations correlated closely with an increase in the geographic distance and/or a decrease in the amount of historical contact. The Turkish Cypriot autosomal STR population study will find immediate use in the Committee on Missing Persons in Cyprus Project on the “Exhumation, Identification and Return of Remains of Missing Persons” and it will also be available for criminal, parentage and other missing person investigations.