Murray P. Cox, corresponding author. e-mail: mpcox@email.arizona.edu phone: +1-520-621 9791; fax: +1-520-626 8050 Published online 28 January 2006 in J-STAGE (www.jstage.jst.go.jp) DOI: 10.1537/ase.050712 |
|
Binary polymorphisms on the Y-chromosome are highly informative anthropological markers for reconstructing the prehistory of men (Underhill et al., 2000). The Y-Chromosome Consortium (YCC) has inferred a detailed tree of global Y-chromosome diversity from over 250 polymorphisms (YCC, 2002; Jobling and Tyler-Smith, 2003), and currently recognizes 18 primary branches called haplogroups or paragroups (labelled A to R). One or more unique polymorphisms characterize each major monophyletic clade, whose distributions are often geographically restricted. Most Y-chromosome studies first classify male samples to a major branch of the Y-chromosome tree, followed by more detailed classification via screening of additional single nucleotide polymorphisms (SNPs) and short tandem repeat (STR) sequences. This paper presents a set of minimal protocols that allow unknown men to be assigned to every major lineage defined by the YCC.
Various methodologies are currently available for screening binary polymorphisms on the Y-chromosome (Budowle, 2004), including DHPLC (Underhill et al., 1997), MALDI-TOF mass spectrometry (Paracchini et al., 2002), flow cytometry (Vallone and Butler, 2004), and SNaPshot minisequencing (Brion et al., 2004). However, such procedures typically utilize proprietary reagents or specialized equipment that are inaccessible to many small genetics laboratories. For some time, a need has been apparent for lineage assays that utilize a less sophisticated methodology. Here I present 15 tests that employ a more widely accessible technique—Polymerase Chain Reaction-Restriction Fragment Length Polymorphism (PCR-RFLP). In conjunction with eight similar protocols published previously, these tests characterize the global array of Y-chromosome diversity, including the 15 haplogroups and three paragroups defined by the Y-Chromosome Consortium (Figure 1).
![]() View Details | Figure 1. Parsimony tree of global Y-chromosome diversity (modified from Jobling and Tyler-Smith, 2003) showing the phylogenetic position of binary polymorphisms mentioned in this paper. |
While not intended to supplant high-throughput methodologies, PCR-RFLP protocols that require only a minimal molecular biology setup fulfil two niche roles. Firstly, many existing multiplexes target Y-chromosome diversity from specific geographical regions (e.g. Africa and Europe: Brion et al., 2004). Singleplex PCR-RFLP tests will allow the screening of individuals derived from geographical regions outside the scope of an adopted multiplex. Secondly, PCR-RFLP assays will allow less well-resourced research groups to type Y-chromosome haplogroup diversity in many human populations that are currently only poorly studied.
The Y-SNP tests presented here exploit two related methods: conventional and forced PCR-RFLP. Conventional PCR-RFLP utilizes polymorphic SNPs that occur naturally within a restriction endonuclease recognition site. Simple presence or absence of the restriction site discriminates between Y-chromosomes that lack (‘ancestral’) or carry (‘derived’) the nucleotide state characteristic of that Y-chromosome lineage. Unfortunately, phylogenetically informative SNPs seldom occur within pre-existing restriction sites, thus necessitating genetic engineering of a novel restriction pattern. This forced PCR-RFLP approach utilizes one PCR primer with a single, predetermined nucleotide mismatch relative to the genuine DNA sequence. PCR amplification incorporates this mismatch primer into the final DNA product, in which the mismatch nucleotide lies adjacent to the haplogroup-defining SNP. This SNP forms an artificial restriction site together with the engineered mismatch nucleotide, and its character state (ancestral or derived) is distinguished by presence/absence of the novel restriction site.
Both PCR-RFLP variants require only basic experimental apparatus (e.g. a PCR thermal cycler, agarose gel electrophoresis equipment), and Y-chromosome lineages are easily assigned by visualizing discrete differences in DNA fragment size on an agarose gel. With the inclusion of ancestral and derived control DNAs, PCR-RFLP methodologies are a cheap and robust alternative to more costly methods of Y-SNP analysis.
Test protocols for lineage-specific Y-chromosome SNPs were validated on control DNAs from the YCC cell line repository (YCC, 2002), a collection of 74 male samples representing a global range of Y-chromosome diversity (kindly provided by Dr Michael Hammer, University of Arizona, USA). All major clades of the Y-chromosome tree are represented by these control DNAs except haplogroup L. Genomic DNA (kindly provided by Dr Chris Tyler-Smith, Wellcome Trust Sanger Institute, UK) classified by DHPLC to haplogroup L-M20 was used to validate the haplogroup L test developed here. To generate sufficient DNA for extensive protocol testing and validation, whole genome amplification was performed on 5 ng aliquots of genomic DNA with Φ29 DNA polymerase (GenomiPhi DNA Amplification Kit, Amersham Biosciences) yielding a final product of 3–4 μg of whole genome amplified DNA (wgaDNA). SNP profiles generated from wgaDNA have greater than 99.8% fidelity relative to SNP profiles generated from unamplified genomic DNA (Hosono et al., 2003; Barker et al., 2004). No discrepancies have been detected by this research group.
PCR amplifications were carried out on 10–20 ng of DNA template with primers that flank short stretches of DNA containing a haplogroup-defining polymorphism. PCR reactions (25 μl) incorporated 1 U of Taq DNA polymerase (Roche Molecular Biochemicals). Taq DNA polymerase lacks proofreading 3′-to-5′ exonuclease activity, thereby preventing undesired ‘correction’ of engineered primer mismatch nucleotides during the PCR reaction. The reaction buffer consisted of 2 mM Tris-HCl pH 7.5, 10 mM KCl, 100 μM dithiothreitol, 10 μM EDTA, 5% v/v glycerol, 2.5 mM MgCl2, 200 μM each dNTP, and 300 nM of the appropriate forward and reverse primers (Proligo Oligonucleotides). Increased efficacy of PCR amplification was facilitated by adding 160 ng/μl bovine serum albumin and 0–1.5% formaldehyde (Sarkar et al., 1990). Table 1 lists the sequences of primers that flank each Y-chromosome binary polymorphism together with the size of the amplified PCR products.
PCR amplifications were performed in 96-well thermal cyclers produced by Hybaid (OmniGene) or Techne (TC-512 Gradient PCR) using either standard or touchdown PCR protocols (Table 1). All PCR amplifications had an initial denaturation step of 94°C for 3 min, a final elongation step of 72°C for 5 min, and a final indefinite hold step of 4°C. Standard PCR conditions were 35–40 cycles of 30 s at 94°C, 30 s at the primer-specific annealing temperature (indicated in Table 1), and 60 s at 72°C. Touchdown PCR protocols were designed to avoid the need to optimize primer-annealing temperatures, and more importantly, to circumvent non-specific PCR products by favoring primer annealing at the intended binding site (Don et al., 1991). Three versions of the touchdown PCR protocols were adopted: TD-A, TD-B, and TD-C. TD-A conditions involved 18 cycles starting at 94°C for 20 s, 63°C for 30 s, and 72°C for 60 s with the annealing temperature decreasing by 0.5°C per cycle, followed by 22 cycles with an annealing temperature of 54°C. TD-B conditions involved 20 cycles starting at 94°C for 20 s, 58°C for 30 s, and 72°C for 60 s with the annealing temperature decreasing by 0.5°C per cycle, followed by 20 cycles with an annealing temperature of 48°C. TD-C conditions involved 18 cycles starting at 94°C for 20 s, 55°C for 30 s, and 72°C for 60 s with the annealing temperature decreasing by 0.5°C per cycle, followed by 22 cycles with an annealing temperature of 46°C.
Amplified products ranged from 88 to 709 bp in length (Table 1), and amplification of the correct fragment was confirmed by electrophoresis on 1–3% w/v SeaKem LE agarose gels (Cambrex). Aliquots of the unpurified PCR product (10 μl) were incubated with an excess (2 U) of the appropriate restriction enzyme (Table 1) in total volumes of 20 μl, under reaction conditions recommended by the manufacturer. Digest mixtures were electrophoresed on 1–3% w/v SeaKem LE agarose gels, and the observed fragments were sized by comparison with a 100 bp DNA ladder (DNA Molecular Weight Marker XIV, Roche Molecular Biochemicals). Ancestral and derived polymorphic states were determined by comparing observed fragment sizes with expected banding patterns (Table 1).
Here I describe novel test procedures for 15 Y-chromosome binary polymorphisms that utilize a PCR-RFLP methodology. Together with pre-existing tests for eight additional binary polymorphisms, these protocols allow the assignment of unknown male samples to every major branch of the Y-chromosome tree, including the 15 haplogroups and three paragroups defined by the Y-Chromosome Consortium (YCC, 2002). These assays have been validated against fully-profiled YCC control wgaDNAs, including direct DNA sequencing of ancestral and derived controls, and are now used routinely to screen non-whole genome amplified genomic DNAs from Africa, Southeast Asia, and Oceania (Cox, 2003). For a practical application of these tests on real-world samples, see Cox and Lahr’s (2006) analysis of Y-chromosome diversity in the Solomon Islands.
Each PCR primer set amplifies a single 88–709 bp DNA fragment. Mismatch primers undergo PCR amplification as efficiently as conventional primers. Two binary polymorphisms (N-Lly22g and P-92R7) occur within segmental duplications (Skaletsky et al., 2003), and these PCR products therefore represent a composite sequence from two Y-chromosome regions. Only one repeat carries the Lly22g and 92R7 polymorphisms, and thus only a portion of the amplified PCR product represents the phylogenetically informative marker. Consequently, these PCR products are not completely digested even when the restriction endonuclease site is present; uncleaved PCR product is always observed following restriction endonuclease digestion. However, ancestral and derived character states can always be assigned correctly for N-Lly22g and P-92R7 (Table 1). Restriction endonuclease cleavage is complete for the remaining assays.
Digested DNA fragments vary from 19 to 709 bp, but no single assay requires this range of DNA fragment sizes to be discriminated. All 23 Y-chromosome lineages can be assigned correctly by examining restriction fragments between 88 and 197 bp, and these fragment sizes are easily distinguished following standard agarose gel electrophoresis (Figure 2). Conventional PCR-RFLP typically generates larger DNA fragments, whereas forced PCR-RFLP tests—in which the mismatch primer forms part of the engineered restriction site—cleave only the primer sequence (19–32 bp) from the PCR product. Electrophoresis times should be increased with forced PCR-RFLP tests to discriminate between cleaved and uncleaved DNA fragments.
![]() View Details | Figure 2. A PCR-RFLP test for Y-chromosome lineage M-M106. Lane 1 (left) is a DNA sizing ladder (marker) with 100 bp increments. Lane 2 shows an undigested 572 bp PCR product. Lanes 3–9 show TaqI digested PCR products from seven unrelated Y Chromosome Consortium DNAs (sample ID numbers and haplogroup assignments indicated above the gel; YCC, 2002). Six samples show two DNA fragments of 348 and 224 bp, and are therefore ancestral (A) for Y-chromosome haplogroup M. Only one sample shows the three DNA fragments of 224, 185, and 163 bp created by the second lineage M-specific TaqI restriction site, and therefore carries the derived (D) haplogroup M polymorphism. |
All binary polymorphisms described here, except P36, demark unique, monophyletic clades on the global Y-chromosome tree. P36 (a G→T polymorphism described originally as G→A: Karafet et al., 2005: p. 98) defines haplogroup Q, a lineage with North American and Central Asian affinities (Zegura et al., 2004). However, P36 recurs on a limited subset of African Y-chromosome backgrounds, including four San bushmen carrying lineage A2 Y-chromosomes (YCC DNAs 5, 22, 34 and 35). Nevertheless, haplogroups A and Q are readily distinguished with the protocols presented here. Lineage A Y-chromosomes can be identified by their A-M91 background; lineage Q Y-chromosomes can be identified by their P-92R7 background. A PCR-RFLP test is presented here for P-92R7, and although the A-M91 polymorphism cannot be characterized by PCR-RFLP, direct DNA sequencing distinguishes the ancestral state (a 9-residue polyadenine tract) from the derived state (an 8- or 10-residue polyadenine tract).
These 15 novel Y-chromosome binary polymorphism assays (in association with eight previously published PCR-based tests) allow full resolution of the major branches of the global Y-chromosome tree. Because many PCR products are smaller than 500 bp and can be amplified from fragmented and degraded DNA samples (unpublished data), some of these tests may find utility in forensic and ancient DNA applications. However, these protocols are intended primarily for modern anthropological surveys of Y-chromosome diversity. PCR-RFLP protocols require little experimental equipment and limited technical expertise. Hopefully these tests will encourage new research groups, perhaps previously dissuaded by the high costs traditionally associated with Y-chromosome screening, to implement a broader range of anthropological surveys, thereby contributing to our growing understanding of global Y-chromosome diversity.
I extend my appreciation to Dr Michael Hammer (University of Arizona) for providing control DNAs from the Y Chromosome Consortium’s cell line repository; Dr Chris Tyler-Smith (Wellcome Trust Sanger Institute) for supplying haplogroup L DNAs and for guidance with Y-SNP typing protocols; and Dr. Peter Underhill (Stanford University) and Dr Matthew Hurles (Wellcome Trust Sanger Institute) for helpful discussion regarding Y-Chromosome typing methods. I also thank the Department of Zoology (University of Oslo), the Department of Biochemistry (University of Otago), and the Leverhulme Centre for Human Evolutionary Studies (University of Cambridge) for their patronage of this research. Funding from the Foundation for Research, Science and Technology (New Zealand), the University of Otago Research Committee (New Zealand), the Kon-Tiki Museum (Norway), and the Isaac Newton Trust (United Kingdom) supported this research.