Edited by Yoko Satta. Heui-Soo Kim: Corresponding author. E-mail: khs307@pusan.ac.kr. Eun-Sil Park and Jae-Won Huh: These authors contributed equally to this work.

Index
INTRODUCTION
MATERIALS AND METHODS
Preparation of genomic DNA samples
PCR amplification
Data analyses
RESULTS
Identifi1 of AluYj subfamilies from GenBank database
PCR analysis of AluYj4 elements
Sequence analysis of AluYj4 element
Evolutionary analysis
DISCUSSION
References

INTRODUCTION

Mobile genetic elements are composed of over 45% of the human genome (International Human Genome Sequencing Consortium 2001). Alu elements, one of the SINEs (Short Interspersed Nuclear Elements), are short repeat sequences of 300bp in length and the largest family of mobile genetic elements in the human genome. These elements account for 10% of the human genome as a copy number of over 1 million by duplicating via RNA-dependent retroposition (Hattori et al., 2000; International Human Genome Sequencing Consortium 2001; Li et al., 2001).

Alu elements have no coding capacity and no enzyme for their retroposition mechanism. Thus, they must borrow the enzyme (reverse transcriptase) to amplify their family from the LINE (Long Interspersed Nuclear Elements) or HERV (Human Endogenous Retrovirus) elements (Mathias et al., 1991; Feng et al., 1996). The structure of an original Alu element was derived from the 7SL RNA (Ullu et al., 1982). Alu elements have dimeric structure which is formed by dimerization of the oldest Alu-related elements, FRAM (Free Right Arm Monomer) and FLAM (Free Left Arm Monomer). Phylogenetic studies of Alu expansion suggested that Alu elements evolved from only a small number of Alu elements termed “master” or “source” genes. Over time, diagnostic mutations of the master genes created new Alu subfamilies during retroposition and gradual accumulation of other mutations caused dividing of Alu subfamilies (Deininger et al., 1992; Batzer et al., 1996).

Alu elements can be divided into 3 subfamilies based upon their key diagnostic mutation sites shared by subfamily members. These are from the oldest, AluJ, to intermediate, AluS, and youngest, AluY. Alu elements sequentially inserted into the primate genome, and amplification of Alu subfamilies occurred at different time during the primate evolution (Britten, 1994; Kapitonov and Jurka, 1996). The oldest AluJ subfamilies are estimated to be ~80million years old, AluS subfamilies are estimated to be 30~50 million years old and the youngest AluY are less than 15 million years old (Mighell et al., 1997). Recently, new AluYa, b, c, d, e, f, g, and i subfamilies were identified in good order by the analysis of diagnostic mutations (Xing et al., 2004). The Alu subfamily nomenclature system was defined by the use of alphabets one by one and the number of different diagnostic sites from consensus sequence (Batzer et al., 1996). Among them, there are polymorphisms for absence or presence of these elements in human population study (Roy-Engel et al., 2001; Xing et al., 2003; Salem et al., 2003a)

Large members of Alu elements had been considered as ‘junk DNA’ with no benefit to human host genome. However, 0.1% of Alu insertion contributed to the genetic disorder by disruption of functional region with insertion event or unequeal, homologous recombination event (Deininger and Batzer, 1999). And, recent findings suggest that Alu elements seem to be related to the several regulatory functions (Mighell et al., 1997; Deininger and Batzer, 1999; Dagan et al., 2004). Parts of Alu sequences include specific motifs that resemble donor and acceptor splice sites. Therefore, they could be inserted into mature messenger RNAs via a splicing-mediated process called exonization (Lev-Maor et al., 2003). This exonization of the Alu elements was occurred more than 5% of human alternatively spliced exons (Li et al., 2001; Sorek et al., 2002; Kreahling and Graveley, 2004).

Identification and molecular analyses of Alu elements and their transcription variants in humans and primates will be of great important for understanding the genetic feature. The human specific AluYj4 element (accession no. AL163282) located on human chromosome 21q22 was identified by PIP (Percent Identity Plot) analysis (Schwartz et al., 2000), but it was not detected on chimpanzee genome. This phenomenon was confirmed by PCR analysis. Those AluY elements were also identified from whole human genome, and analyzed with other Alu families.


MATERIALS AND METHODS

Preparation of genomic DNA samples

DNA was isolated from blood sample following a standard protocol (Sambrook et al., 1989) from the following species: 11 humans (African, Caucasian, Korean, Japanese, Chinese-yunnan, Mongolian-khalkh, Mongolian-buryate, Chinese -beijing, Indonesian, Philipinos, and Butanese); the hominoid primates, chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla), orangutan (Pongo pygmaeus), and gibbons (Hylobates agilis) ; Old World monkeys, Japanese monkey (Macaca fuscata), and crab-eating monkey (Macaca fascicularis) ; New World monkeys, night monkey (Aotus trivirgatus), and common marmoset (Callithrix jacchus).

PCR amplification

The genomic DNA samples were subjected to PCR amplification. New AluYj4 elements were amplified by specific primer pair, A21S (5'-CTG TGG CAT ATC CAA GGA ACT-3', bases 205973~205993) and A21AS (5'-GTT GTG GGT GCA CAA TAC AG-3', bases 206696~206715) from human chromosome 21q22 (GenBank accession no. AL163282). Specific primer pair, A2S (5'-TTG ACT GTG AAT GTA CAG GTG-3', bases 35080~35100) and A2AS (5'-TAA CAG TAG TTG TCT GAC CTC-3', bases 35603~35623) from human chromosome 2 (GenBank accession no. AC017101). Specific primer pair, A10S (5'-GAG ATA AGC AGA GCA GAA CAA-3', bases 15703~15723) and A10AS (5'-CAG TGA GGC AAA CAG CTA TTA-3', bases 16161~16181) from human chromosome 10 (GenBank accession no. AC044786). Specific primer pair, A122S (5'-TTG GCT TTG TCC TAC AAG TCT-3', bases 71214~71234) and A122AS (5'-CTT AAC AGT AAC CAC TCA CTC-3', bases 71801~71821) from human chromosome 12 (GenBank accession no. AC007656). These primers were designed to distinguish the new AluYj4 of human specific elements among primate samples. The PCR conditions were those of Kim et al. (1996) with an annealing temperature of 58°C.

Data analyses

New AluY in human chromosome 21 was compared with other AluY subfamilies updated on Repbase (Jurka, 2000). Mining of other new AluY elements was retrieved from the GenBank database with the aid of blast network server (Altschul et al., 1997). Nucleotide sequences were aligned by MULTALIN (Corpet, 1988) and personally verified using Bioedit program (Hall, 1999). Integration time of AluY elements was estimated using the formula, T = d/2μ, (μ is the evolutionary rate of Alu sequences, d is the pairwise distance and T is the integration time) (Tajima and Nei, 1984).


RESULTS

Identifi1 of AluYj subfamilies from GenBank database

Forty-five members of human specific AluY elements were identified through the comparative analysis of human chromosome 21 and chimpanzee chromosome 22 by PIP analysis. However, these elements already described by other group as human specific elements (Watanabe et al., 2004). For the detailed analysis, 45 AluY elements were classified into several published AluY subfamilies based on Repbase Update (Jurka, 2000; Jurka et al., 2002). Among these elements, only one AluY element (AluY_ AL163282) was not classified into any AluY subfamilies. So, we assumed that AluY_ AL163282 located on human chromosome 21q22, could have been new Alu subfamily. To prove our hypothesis, GenBank database searching was conducted by using the sequences of AluY_AL163282 as query. Hundreds of AluY elements with over 97% similarity were identified. Important point of new AluY subfamily is common diagnostic mutations with consensus AluY sequences. To discriminate a particular set of diagnostic mutation sites from hundreds of AluY elements with 97% similarity, we applied the molecular phylogenetic analysis. Among them, total 21 AluY elements (AluYj3, 8 elements; AluYj4, 13 elements) were identified as same subfamily by the use of alignment program. We found that they shared additional 3 (AluYj3) or 4 (AluYj4) specific mutation sites as well as 6 diagnostic mutations of AluY consensus sequences (Fig. 1). Therefore, these sequences seem to be new AluY subfamily. According to standardized nomenclature for Alu repeats, new AluY subfamilies including AluY_AL163282 could be designated as AluYj3 and AluYj4 subfamilies (Batzer et al., 1996).


View Details
Fig. 1.
Sequence alignment of various human AluY subfamilies. Various AluY subfamilies show the different diagnostic mutation sites. Yj3 and Yj4 have three same diagnostic mutation sites and Yj4 have an additional site. Dotted box is the diagnostic mutation of Yj3 and Yj4. ‘–’ indicated the same sequences with consensus sequence (AluY).


PCR analysis of AluYj4 elements

Comparative analysis of nucleotide sequences between human chromosome 21 and chimpanzee chromosome 22 indicated that AluYj4 element was present in human genome only. Using specific primers of the AluYj4 element and genomic DNA of primates (human, chimpanzee, gorilla, orangutan, gibbon, Japanese monkey, crab-eating monkey, night monkey, and common marmoset), PCR amplification was performed. The results indicated that 745 bp band including AluYj4 element was detected in humans only (Fig. 2A). In addition, the AluYj4 family (AC01701, AC044786, and AC007656) was also analyzed by PCR amplification. As shown in Fig. 2B, C and D, the AluYj4 elements in human samples showed 282 bp larger sizes in length than non-human primate samples. Human specific AluYj4 elements (AL163282, AC01701, AC044786, and AC007656) were continuously analyzed in different kinds of men (African-amercican, Caucasian, Korean, Japanese, Chinese-yunnan, Mongolian-khalkh, Mongolian-buryate, Chinese-beijing, Indonesian, Philipinos, and Butanese) to test the polymorphism, but identical bands were detected among human population


View Details
Fig. 2.
PCR analysis of new AluYj4 elements in various primate genomic DNAs. (A) Yj4_21_AL163282, (B) Yj4_2_AC017101, (C) Yj4_10_AC044786, (D) Yj4_12_AC007656. Lanes containing PCR products are as follows; M: marker (pUC18/Taq?), HU: human, CH: chimpanzee, GO: gorilla, OR: orangutan, GI: gibbon, JM: Japanese monkey, CE: crab-eating monkey, NM: night monkey, and CM: common marmoset. DNA band in the upper sizes indicates the presence of new AluYj4, those in the lower sites indicates its absence.


Sequence analysis of AluYj4 element

Human specific AluYj4 element on chromosome 21 was found to be identical sequences with accession number AL163282. Using the MULTALIN program, various AluY subfamilies were analyzed using AluYj4 element. The element showed 4 diagnostic mutation sites for AluYj4 and 3 random mutations as compared to AluY consensus sequences (see also Fig. 1). Using blast search program and bioinformatic tools, AluYj4 and AluYj3 subfamilies were collected with showing same diagnostic mutation sites. All their TSD (target site duplication) was well conserved from 6bp~17bp except for 3 elements (Yj3_4_AC079768, Yj3_1_BX664740, and Yj4_7_AC092647) (Table 1). TSD composition analysis of AluYj subfamilies (A+T, 151(67%); G+C, 73(33%)) indicated that new insertion sites preferred to AT rich region. Another TSD composition analysis of human specific elements (A+T, 175(67%); G+C, 86 (33%)) retrieved by reference studies showed the same preference (Batzer et al., 1990)


View Details
Table 1.
Summary of sequence analysis of AluYj subfamily


Evolutionary analysis

In order to analysis the AluYj subfamily, we reconstructed the consensus sequence of AluYj3 and AluYj4 subfamilies. Substitution rate was estimated for the understanding of the molecular clock of 21 AluYj elements (Table 1). To estimate the exact molecular clock, we considered CpG (CG => CA or TG) and non-CpG mutations and poly A tail (remove). All non-CpG mutations were counted for total estimation. Among 1598 nucleotides, total 37 nucleotides were mutated in non-CpG sites. So, for a neutral rate of evolution estimation, 0.15% per million years for non-CPG position was applied (Xing et al., 2003). Approximately 15.5 million years ago, AluYj3 subfamily may be inserted in our common ancestor genome, and then AluYj4 subfamily may be proliferated after the divergence of human and chimpanzee.


View Details
Fig. 3.
PCR analysis of new AluYj4 elements in various human genomic DNAs. (A) Yj4_21_AL163282, (B) Yj4_2_AC017101, (C) Yj4_10_AC044786, (D) Yj4_12_AC007656. Lanes containing PCR products are as follows; M: marker (pUC18/Taq?), 1: African 2: Caucasian, 3: Korean, 4: Japanese, 5: Chinese-yunnan, 6: Mongolian-khalkh, 7: Mongolian-buryate, 8: Chinese-beijing, 9: Indonesian, 10: Philipinos, and 11: Butanese.



DISCUSSION

Alu elements are stratified with Alu J, S and Y in humans and primates evolution. Many of AluY subfamilies were identified with various comparative studies. As AluY subfamilies are proliferated recently, some AluY elements are absent from the genome of non-human primates and shows the polymorphism in human population (Batzer and Deininger, 2002; Dagan et al., 2004). Newly amplified AluYa - AluYi subfamilies have been characterized based on their diagnostic mutations (Jurka, 2000; Jurka et al., 2002, Salem et al., 2003a). Using the nucleotide sequences between human chromosome 21 and chimpanzee chromosome 22, Alu elements are analyzed by various bioinformatic tools. Human specific 45 AluY elements were isolated from the sequence comparison. Among 45 elements, 44 members of the AluY elements were belonged to published AluY subfamilies, Alu Ya,b,c,d,e,f,g,h and i, respectively. However, only one AluY element (AL163282) was not categorized by any other subfamilies. It was new AluY member that had direct repeats at the flanking region which called tandem site duplication (TSD). The TSD sites could be useful marker for the confirmation of recent integration event by retrotransposition mechanism apart from conversion event (Hutchinson et al., 1993; Jurka and Klonowski, 1996; Roy-Engel et al., 2002). The Alu Yj4 element (AL163282) had TSD sites (GAAAATAGAACTGA) obviously (see Table 1). From the whole genome analysis, sixteen members as human specific AluY elements are identified and classified as AluYj4 and AluYj3 (Fig. 1).

For the test of human specific insertion and polymorphism event, we selected 4 elements (Yj4_2_AC017101, Yj4_10_AC044786, Yj4_1_AC007656 and Yj4_21_ AL163282) from human chromosomes 2, 10, 12, 21. We examined various human population and primate samples, indicating that human specific event appeared on examined loci without showing any polymorphism (Fig. 2, Fig. 3). This data suggests that the AluYj4 elements might be fixed before the divergence of human population. In the Alu sequence analysis, CpG mutation consideration was very important. Because, CpG region easily mutated through deamination and DNA duplication mechanism compared with non-CpG region. As shown in Table 1, integration time of AluYj was estimated using mutation rate of 0.15% for per million year per site for non-CpG sites only. The average age of AluYj3 and AluYj4 subfamilies was 15.5 and 2.1 Myr, suggesting that first AluYj3 subfamily seem to be retroposed into the human genome during the primate evolution and then re-amplification event were occurred in human evolution. Old AluYj3 elements could be evaluated as molecular cladistic markers to analyze the phylogenetic affiliations among the primate infraorders (Schumitz et al., 2001; Salem et al., 2003b).

We thank Prof. O. Takenaka, Primate Research Institute, Kyoto University, Japan, for providing primate samples. This research was supported form KRIBB Research Initiative Program.


References
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
Batzer, M. A., Kilroy, G. E., Richard, P. E., Shaikh, T. H., Desselle, T. D., Hoppens, C. L., and Deininger, P. L. (1990) Structure and variability of recently inserted Alu family members. Nucleic Acids Res. 18, 6793–6798.
Batzer, M. A., and Deininger, P. L. (2002) Alu repeats and human genomic diversity. Nat. Rev. Genet. 3, 370–379.
Batzer, M., Deininger, P. L., Hellmann-Blumberg, U., Jurka, J., Labuda, D., Rubin, C. M., Schmid, C. W., Zietkiewicz, E., and Zuckerkandl, E. (1996) Standardized nomenclature for Alu repeats. J. Mol. Evol. 42, 3–6.
Britten, R. J. (1994) Evidence that most human Alu sequences were inserted in a process that ceased about 30 million years ago. Proc. Natl. Acad. Sci. USA 91, 6148–6150.
Corpet, F. (1988) Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res. 16, 10881–10890
Dagan, T., Sorek, R., Sharon, E., Ast, G., and Graur, D. (2004) AluGene: a database of Alu elements incorporated within protein-coding genes. Nucleic Acids Res. 32, D489–492.
Deininger, P. L., and Batzer, M. A. (1999) Alu repeats and human disease. Mol. Genet. Metab. 67, 183–193.
Deininger, P. L., Batzer, M. A., Hutchison, C. A., and Edgell, M. H. (1992) Master genes in mammalian repetitive DNA amplification. Trends Genet. 8, 307–311.
Feng, Q., Moran, J. V., Kazazian, H. H Jr., and Boeke, J. D. (1996) Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell 87, 905–916.
Hall, T. A. (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acids. Symp. Ser. 41, 95–98
Watanabe, H., Fujiyama, A., Hattori, M., Taylor, T. D., Toyoda, A., Kuroki, Y., Noguchi, H., BenKahla, A., Lehrach, H., Sudbrak, R., Kube, M., Taenzer, S., Galgoczy, P., Platzer, M., Scharfe, M., Nordsiek, G., Blocker, H., Hellmann, I., Khaitovich, P., Paabo, S., Reinhardt, R., Zheng, H. J., Zhang, X. L., Zhu, G. F., Wang, B. F., Fu, G., Ren, S. X., Zhao, G. P., Chen, Z., Lee, Y. S., Cheong, J. E., Choi, S. H., Wu, K. M., Liu, T. T., Hsiao, K. J., Tsai, S. F., Kim, C. G., Oota, S., Kitano, T., Kohara, Y., Saitou, N., Park, H. S., Wang, S. Y., Yaspo, M. L., and Sakaki, Y. (2004) Nature 429, 382–328.
Hattori, M., Fujiyama, A., Taylor, T. D., Watanabe, H., Yada, T., Park, H. S., Toyoda, A., Ishii, K., Totoki, Y., Choi, D. K., Groner, Y., Soeda, E., Ohki, M., Takagi, T., Sakaki, Y., Taudien, S., Blechschmidt, K., Polley, A., Menzel, U., Delabar, J., Kumpf, K., Lehmann, R., Patterson, D., Reichwald, K., Rump, A., Schillhabel, M., Schudy, A., Zimmermann, W., Rosenthal, A., Kudoh, J., Schibuya, K., Kawasaki, K., Asakawa, S., Shintani, A., Sasaki, T., Nagamine, K., Mitsuyama, S., Antonarakis, S. E., Minoshima, S., Shimizu, N., Nordsiek, G., Hornischer, K., Brant, P., Scharfe, M., Schon, O., Desario, A., Reichelt, J., Kauer, G., Blocker, H., Ramser, J., Beck, A., Klages, S., Hennig, S., Riesselmann, L., Dagand, E., Haaf, T., Wehrmeyer, S., Borzym, K., Gardiner, K., Nizetic, D., Francis, F., Lehrach, H., Reinhardt, R., and Yaspo, M. L., Chromosome 21 mapping and sequencing consortium. (2000) The DNA sequence of human chromosome 21. Nature 18, 311–319.
Hutchinson, G. B., Andrew, S. E., McDonald, H., Goldberg, Y. P., Graham, R., Rommens, J. M., and Hayden, M. R. (1993) An Alu element retroposition in two families with Huntington disease defines a new active Alu subfamily. Nucleic Acids Res. 21, 3379–3383.
Jurka, J., and Klonowski, P. (1996) Integration of retroposable elements in mammals: selection of target sites. J. Mol. Evol. 43, 685–689.
Jurka, J. (2000) Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16, 418–420.
Jurka, J., Krnjajic, M., Kapitonov, V. V., Stenger, J. E., and Kokhanyy, O. (2002) Active Alu elements are passed primarily through paternal germlines. Theor. Popul. Biol. 61, 519–530.
Kapitonov, V., and Jurka, J. (1996) The age of Alu subfamilies. J. Mol. Evol. 42, 59–65.
Kim, H. S., Hirai, H., and Takenaka, O. (1996) Molecular features of the TSPY gene of gibbons and Old World monkeys. Chromosome Res. 4, 500–506.
Kreahling, J., and Graveley, B. R. (2004) The origins and implications of Aluternative splicing. Trends Genet. 20, 1–4.
Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C. et al., International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. (2001) An assembly and annotation of the first draft sequence of the entire human genome that includes a comprehensive analysis of repeated DNA sequences. Nature 409, 860–921.
Lev-Maor, G., Sorek, R., Shomron, N., and Ast, G. (2003) The birth of an alternatively spliced exon: 3' splice-site selection in Alu exons. Science 300, 1288–1291.
Li, W. H., Gu, Z., Wang, H., and Nekrutenko, A. (2001) Evolutionary analyses of the human genome. Nature 409, 847–849.
Mathias, S. L., Scott, A. F., Kazazian, Jr. H. H., Boeke, J. D., and Gabriel, A. (1991) Reverse transcriptase encoded by a human transposable element. Science 254, 1808–1810.
Mighell, A. J., Markham, A. F., and Robinson, P. A. (1997) Alu sequences. FEBS Lett. 417, 1–5.
Roy-Engel, A. M., Carroll, M. L., Vogel, E., Garber, R. K., Nguyen, S. V., Salem, A. H., Batzer, M. A., Deininger, P. L. (2001) Alu insertion polymorphisms for the study of human genomic diversity. Genetics. 159, 279–290.
Roy-Engel, A. M., Carroll, M. L., El-Sawy, M., Salem, A. H., Garbe,r R. K., Nguyen, S. V., Deininger, P. L., and Batzer, M. A. (2002) Non-traditional Alu evolution and primate genomic diversity. J. Mol. Biol. 316, 1033–1040.
Salem, A. H., Kilroy, G. E., Watkins, W. S., Jorde, L. B., and Batzer, M. A. (2003a) Recently integrated Alu elements and human genomic diversity. Mol. Biol. Evol. 20, 1349–1361.
Salem, A. H., Ray, D. A., Xing, J., Callinan, P. A., Myers, J. S., Hedges, D. J., Garber, R. K., Witherspoon, D. J., Jorde, L. B., and Batzer, M. A. (2003b) Alu elements and hominid phylogenetics. Proc. Natl. Acad. Sci. USA. 100, 12787–12791.
Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) Molecular cloning: A laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.
Schmitz, J., Ohme, M., and Zischler, H. (2001) SINE insertions in cladistic analyses and the phylogenetic affiliations of Tarsius bancanus to other primates. Genetics 157, 777–784.
Schwartz, S., Zhang, Z., Frazer, K. A., Smit, A., Riemer, C., Bouck, J., Gibbs, R., Hardison, R., and Miller, W. (2000) PipMaker--A web server for aligning two genomic DNA sequences. Genome Res. 10, 577–586.
Sorek, R., Ast, G., and Graur, D. (2002) Alu-containing exons are alternatively spliced. Genome Res. 12, 1060–1067.
Tajima, F., and Nei, M. (1984) Estimation of evolutionary distance between nucleotide sequences. Mol. Biol. Evol. 1, 269–285.
Ullu, E., and Tschudi, C. (1984) Alu sequences are processed 7SL RNA genes. Nature 312, 171–172.
Xing, J., Salem, A. H., Hedges, D. J., Kilroy, G. E., Watkins, W. S., Schienman, J. E., Stewart, C. B., Jurka, J., Jorde, L. B., and Batzer, M. A. (2003) Comprehensive analysis of two AluYd subfamilies. J. Mol. Evol. 57, S76–S89.
Xing, J., Hedges, D. J., Han, K., Wang, H., Cordaux, R., and Batzer, M. A. (2004) Alu element mutation spectra: molecular clocks and the effect of DNA methylation. J. Mol. Biol. 344, 675–682.