Edited by Ryo K. Takahashi. Yoshiki Yasukochi: Corresponding author. E-mail: hyasukou@proof.ocn.ne.jp. Note: Supplementary materials in this article are at http://www.jstage.jst.go.jp/browse/ggs |
|
Cytochrome P450 2D6 (CYP2D6) is an important enzyme involved in the metabolism of about 25% of commonly used therapeutic drugs (Ingelman-Sundberg, 2005), showing a high affinity for alkaloids (Fonne-Pfister and Meyer, 1988). CYP2D6 belongs to the CYP2D subfamily, a gene cluster within a contiguous region of about 45 kb on chromosome 22 (Kimura et al., 1989), which in humans comprises the CYP2D6 gene and two pseudogenes CYP2D7P and CYP2D8P. The nucleotide sequences of these pseudogenes are highly similar to those of the CYP2D6 gene.
In recent years, entire genome databases have become available from various species. In particular, the whole genome assemblies of four non-human primates have been released since 2006: that of the chimpanzee (Pan troglodytes), released by the Washington University Genome Sequencing Center (Pan_troglodytes-2.1; The Chimpanzee Sequencing and Analysis Consortium, 2005), the Sumatran orangutan (Pongo pygmaeus abelii), produced by the Genome Sequencing Center at Washington University School of Medicine in St. Louis in July 2007 (WUSTL version Pongo_abelii-2.0.2), the rhesus monkey (Macaca mulatta), released by the Macaque Genome Sequencing Consortium in February 2006 (v.1.0, Mmul_051212; Rhesus Macaque Genome Sequencing and Analysis Consortium, 2007) and a draft assembly of the common marmoset (Callithrix jacchus), produced by WUSTL School of Medicine Genome Sequencing Center [WUGSC 3.2 (GCA_000004665.1)]. Based on the information released, the CYP2D6 gene is located on chromosome 22 in the genomes of the chimpanzee (based on the chromosome naming system proposed by McConkey; McConkey, 2004) and the orangutan, and on chromosome 10 in the genome of the rhesus monkey, based on chromosome numbering system of Rogers et al. (2006). No other annotations exist regarding the CYP2D6 gene in the common marmoset or other CYP2D genes in any of these species.
The genetic variability of CYP2D6 has been extensively studied in human populations due to its clinical importance (Xie et al., 2001; Bradford, 2002; Mizutani, 2003; Raimundo et al., 2004; Sistonen et al., 2007) but to our knowledge there has been no study to date exploring molecular evolution of the CYP2D subfamily in the human genome. Study on the origin and evolution of this subfamily is important to understanding of drug metabolism in humans because the study brings us the knowledge of when and how we have acquired a metabolic system for exogenous substrates. The knowledge can also reveal the difference in a metabolism of substrates such as drugs between humans and other animals (e.g., experimental animal). Here, we preliminary compare the organization of the CYP2D subfamily in the human genome with genomes of the chimpanzee, orangutan, rhesus monkey and common marmoset and also try to trace its evolutionary origin of the subfamily in animals.
Genomic sequences of the chimpanzee, Sumatran orangutan, rhesus monkey and common marmoset genomes were used for the identification of CYP2D genes. Sequence data for humans, chimpanzees and rhesus monkeys were obtained from the NCBI genome database (http://www.ncbi.nlm.nih.gov/), whereas those of the orangutan and common marmoset were obtained from the UCSC genome database (http://genome.ucsc.edu/). CYP2D genes from non-human primates were identified by BLAST and Blat homology search and gene order was examined with referring to human CYP2D sequences (Genbank accession numbers: M33387 and M33388). The genome databases of eight eutherians (house mouse, Mus musculus; Norway rat, Rattus norvegicus; rabbit, Oryctolagus cuniculus; cattle, Bos taurus; pig, Sus scrofa; horse, Equus caballus; giant panda, Ailuropoda melanoleuca and dog, Canis lupus familiaris), one marsupial (gray short-tailed opossum, Monodelphis domestica), one monotreme (platypus, Ornithorhynchus anatinus), one bird (chicken, Gallus gallus), one reptile (green anole lizard, Anolis carolinensis), two amphibians (African clawed frog, Xenopus laevis and western clawed frog, Xenopus tropicalis), three fishes (zebrafish, Danio rerio; medaka fish, Oryzias latipes and puffer fish, Takifugu rubripes) and one urochordate (sea squirt, Ciona intestinalis) were used to search the origin of the CYP2D subfamily. The synteny to human CYP2D genes was predicted by the NCBI Map viewer (http://www.ncbi.nlm.nih.gov/mapview/) and Ensemble Genome Browser (http://www.ensembl.org/index.html). The CYP2D genomic sequences of the gorilla (Gorilla gorilla), the mouse lemur (Microcebus murinus) and the tarsier (Tarsius syrichta) were also available, but the sequences are incomplete (i.e., they included many undetermined nucleotides). Thus, their sequences were excluded from the analysis.
The chimpanzee, orangutan and rhesus monkey also possess a few undetermined nucleotides but not as many as the gorilla, mouse lemur and tarsier. In addition, there are some frameshift and nonsense mutations in the putative coding region of some CYP2D sequences. Hence, the undetermined parts of sequences were determined and the deleterious mutations in some genes were confirmed. Genomic DNA samples of the chimpanzee, orangutan and rhesus monkey were provided by the Primate Research Institute of Kyoto University and the Max Planck Institute for Biology. DNA was amplified by PCR with primers (Table 1). PCR amplification was carried out using a DNA Thermal Cycler in 25 μl reaction mixture with TaKaRa LA Taq Hot Start Version (TaKaRa Bio Inc.) or PCR Master Mix (Promega). PCR conditions were according to the manufacturer’s instructions. PCR products used for sequencing were purified with ExoSAP-IT (USB), and cycle sequencing was performed with the BigDye terminator v3.1 cycle sequencing kit (Applied Biosystems). Sequencing was conducted with an ABI PRISM 3130X/Genetic Analyzer (Applied Biosystems). The alignment of sequence data was carried out using MEGA Ver. 4.1 Beta (Tamura et al., 2007). The alignment was modified by hand later and the positions of deletions or insertions (indels) were excluded from subsequent analyses. A neighbor-joining (NJ) tree (Saitou and Nei, 1987) was reconstructed based on the empirical JTT amino acid substitution matrix (Jones et al., 1992). Bootstrap analysis was performed using 1,000 replications. Maximum likelihood (ML) and Bayesian phylogenetic trees were implemented in the PHYLIP 3.69 package (Felsenstein, 2009) and MrBayes ver. 3.1.2 (Ronquist and Huelsenbeck, 2003), respectively. Bootstrap analyses used 100 replicates for the ML trees. Distance was corrected by the JTT matrix-based method. A global rearrangement was allowed and the input order of OTU was randomized with three jumbles during randomization. The Bayesian analysis was conducted considering 7.1 × 105 generations and tree sampling every 100 generations. The first 1,775 trees were discarded as burn-in. Distance was corrected by the WAG matrix with gamma correction for site rate variation (Whelan and Goldman, 2001). The ML and Bayesian trees were visualized with TreeView version 1.6.6 (Page, 1996). Transposable elements were predicted by Repbase Update (Jurka et al., 2005) with the CENSOR (Kohany et al., 2006) and RepeatMasker (Smit et al., 1996–2010) softwares.
![]() View Details | Table 1 Primer list used in this study |
Gene organization of the CYP2D subfamily in five primate species was revealed by homology search with each human counterpart (Fig. 1). In these non-human primates, nucleotide sequences of eight fragments contained undetermined nucleotides or putative deleterious mutations in the referenced genome database of the CYP2D genes. To fill in sequence gaps and confirm the presence of deleterious mutations the following fragments were sequenced: 837, 611 and 775 bp in the rhesus monkey CYP2D8P (-like) gene; 1,153 bp in the chimpanzee CYP2D7P; 121 bp in the chimpanzee CYP2D8P; 2,113 bp in the orangutan CYP2D6; 979 bp in the orangutan CYP2D7P and 2,011 bp in the orangutan CYP2D8P. These sequences are deposited in the DNA Data Bank of Japan (DDBJ) (Genbank accession numbers: AB594492–AB594499). A single CYP2D gene consisting of nine exons and eight introns ranged from 4,000 bp in the marmoset CYP2D6 to 5,200 bp in the orangutan CYP2D7P. Such differences in sequence length among species resulted from intron size variation. Human CYP2D8P ortholgs were present in all primates, with those in the rhesus monkey and marmoset genomes being apparently functional due to the absence of putative premature stop codons in their coding regions. On the other hand, the CYP2D7P ortholog was not found in the monkey and marmoset and does not seem to be a pseudogene in the chimpanzee. We therefore named the ortholog in the chimpanzee without the character “P” indicating a pseudogene. In the orangutan this gene seems to have been pseudogenized independently from humans due to frameshift mutations (data not shown).
![]() View Details | Fig. 1 Diagram of the organization of the CYP2D subfamily in five primate species. Only transposable elements detected by both methods (RepeatMasker and CENSOR) are shown. They are named following the nomenclature of RepeatMasker. Asterisks indicate CENSOR nomenclature. Closed triangles represent an Alu element. Gray triangles represent L1 or L2 elements. Open triangles represent other elements. The name of transposable elements in non-human primates that have a location or order identical to that in humans is not shown. |
Kimura et al. (1989) have reported that the exonic sequence of the human CYP2D7P shares a higher level of similarity to CYP2D6 than to CYP2D8P. The comparison of sequences performed here also shows the same trend (Fig. 2). In the chimpanzee, nucleotide similarity between CYP2D6 and CYP2D7 and that between CYP2D6 and CYP2D8P were, respectively, 97% and 93%, and in the orangutan 95% and 93%, respectively. These results indicate that the CYP2D7 gene has been duplicated from CYP2D6 before the divergence between humans and great apes during the Miocene. On the other hand, the CYP2D6 and CYP2D8 or CYP2D8P are present in all five primates, indicating that the origin of these genes in the human genome can be traced back to, at latest, a stem lineage of New World monkeys and Catarrhini. Although the CYP2D genes are annotated in the mouse genome database, CYP2D6 and CYP2D8P orthologous genes are ambiguous as the mouse has nine active CYP2D products (Nelson et al., 2004).
![]() View Details | Fig. 2 Alignment of the CYP2D6 and CYP2D7 or CYP2D7P genes from humans and chimpanzees. Hosa and Patr represent the human and chimpanzee, respectively. Dots indicate identity with the nucleotides of Patr CYP2D6 (DQ282164). Nucleotide position numbers at the top of the figure represent variable sites. Gray boxes indicate sequences that are putatively exchanged between the Patr CYP2D6 and CYP2D7 genes. The asterisk indicates the sequence modified by this study. Part of the modified sequences was confirmed by PCR using each gene-specific primer pair and sequencing (Heavy underlines). In the actual sequences, the upper four and the lower three sequences of the alignment are located downstream and upstream, respectively. |
To investigate the origin of the CYP2D subfamily in humans, synteny among seventeen vertebrates and one urochordate CYP2D candidate gene clusters was examined. The organization of the genes surrounding CYP2D candidate genes in amniotes (i.e., mammals, birds and reptiles) was relatively similar to that of genes surrounding human CYP2D genes, with adjacent genes NDUFA6 and TCF20 being detected in all cases. In contrast, such regions in fishes and the sea squirt do not show significant similarity to those of amniotes. In the amphibian genome, NDUFA6 and TCF20 were not found, but other adjacent genes SREBF2 and WBP2NL were observed together with CYP2D ortholgs. A BLAST search revealed that the human CYP2D6 gene showed relatively high similarity to the Cyp2k and Cyp2j genes in the zebrafish and the Cyp2j genes in the medaka fish and puffer fish genomes. However, the Ensemble Genome Browser revealed that CYP2J2 in the human was probably an ortholog of the zebrafish Cyp2j gene, and human CYP2W1 might be an ortholog of the zebrafish Cyp2k. The ML phylogeny based on Clan 2 (CYP1, CYP2, CYP17 and CYP21) amino acid sequences in the zebra fish and human genomes showed that sequences of the fish Cyp2ks and Cyp2js (Cyp2js were described as CYP2N, CYP2P, CYP2V and CYP2AD in the analysis) formed a monophyletic group with those of the human CYP2W1 and CYP2J2, respectively, but the fish has no CYP2D candidate (Goldstone et al., 2010). This previous study moreover revealed that CYP2j genes shared synteny with the human CYP2J2. Nelson (2011) has also reported that some fish Cyp2k sequences are syntenic and therefore orthologous to CYP2W1 in chicken and mammals although the similarity between Cyp2k and CYP2W sequences is low. These results therefore suggest that the origin of the CYP2D subfamily in primates might be traced back to a stem lineage between amniotes and amphibians. The CYP2D candidate of the sea squirt is annotated as CYP2D6 according to automated computational analysis of the NCBI annotation (Genbank accession number: XM_002128104). However, our results indicate that the annotated CYP2D6 in the sea squirt must be more closely related to another CYP2 gene in vertebrates.
We reconstructed NJ, ML and Bayesian trees based on amino acid sequences of CYP2D candidate genes together with human CYP2W1 and CYP2J2 (Fig. 3 and Supplementary Figs. S1–S3). Although the topology of the three trees was slightly different from each other, the CYP2D candidate genes of amniotes and amphibians formed a monophyletic cluster to the Cyp2k/Cyp2j genes of fishes and the CYP2W/CYP2J genes of humans (Supplementary Figs. S1–S3). This implies that the CYP2D subfamily could have already been present before the divergence of amniotes from amphibians. Although both of the lizard and chicken have a single CYP2D gene, amphibians and major mammalian orders possess multiple CYP2D genes and show an independent expansion of CYP2D. While primates have two to three CYP2D genes, rodents have five to seven, rabbits have five, and horses have six. This expansion of the CYP2D subfamily in herbivores is interesting, and might be related to the very high affinity of the CYP2D6 enzyme for plant toxins like alkaloids (Fonne-Pfister and Meyer, 1988). Kubota et al. (2011) have reported that the anole lizard appears to have a much larger set of CYP2 genes, especially CYP2G and CYP2AG, than those of chicken or zebra finch, Taeniopygia guttata. The number of lizard CYP2 genes also appears to be much larger than those of the human. However, the lizard genome has a single CYP2D candidate. It is interesting to examine why the lineage-specific gene expansion has not been occurred in the lizard CYP2D subfamily but occurred in the CYP2G and CYP2AG subfamilies. Although it is not so easy to prove it, one may hypothesize that the lizard is not necessary to increase detoxification activity against plant, but is necessary to enlarge the functions of CYP2G and CYP2AG enzymes. The CYP2G enzymes are known to be expressed specifically in the olfactory mucosa of several mammals (Larsson et al., 1989; Nef et al., 1989; Reed, 1993; Hua et al., 1997) although the CYP2AG enzyme is identified only in the anole lizard. Many tetrapod vertebrates have a vomeronasal organ involved in the CYP2G gene expression and the organ is particularly well-developed in lizards and snakes (Schwenk, 1995). Although the functions of CYP2G and 2AG in the lizard are not known yet, the extent of expansion in different subfamilies is likely to reflect the requirement of genes in an environment.
![]() View Details | Fig. 3 Neighbor-joining trees of CYP2D genes of amniotes and amphibians based on amino acid sequences of the full-length coding region. The distance is corrected by the JTT matrix-based method. Only bootstrap values over 50% are shown. Numbers in parentheses are Genbank accession numbers, whereas numbers with an asterisk represent the Ensemble gene ID. |
Furthermore, we focused on the evolutionary mode of CYP2D in primates. Since CYP2D genes were still not annotated in non-human primates, CYP2D orthologies were examined among different primate species by searching cladistic markers such as LINEs or SINEs in the CYP2D clusters. Various transposable elements, including Alu elements, were found in intergenic regions (Fig. 1). It has been reported that the high rate of gene conversion is related to the dense distribution of Alu elements in the human genome (Chen et al., 2007). Indeed, many studies have described gene conversion among human CYP2D genes (Kimura et al., 1989; Gonzalez and Nebert, 1990; Hanioka et al., 1990; Heim and Meyer, 1992; Masimirembwa et al., 1996). However, LINEs and SINEs were observed at identical sites among several primate genomes (Fig. 1). This indicates that gene conversion between flanking regions of different CYP2D is not obvious. Rather, each flanking region has maintained unique patterns of insertion of LINEs or SINEs.
In intron 1 of the CYP2D8 or CYP2D8P, we detected AluSx-AluY-AluSg4 in hominoid genomes, AluSx-AluSg4-AluYRa1 in the rhesus monkey genome, and AluSx-AluSg4 in the marmoset (Fig. 1). This result indicates that AluSx-AluSg4 have been inserted into intron 1 before the divergence between New World monkeys and Catarrhini, but after the divergence of the CYP2D6 and CYP2D8 genes.
In the chimpanzee, the CYP2D genes on chromosome 22 genomic contig (Genbank accession number: NW_001230982.1) are CYP2D6, CYP2D7 (Entrez Gene ID: 745989), and CYP2D8P (Entrez Gene ID: 470229). However, according to the annotation in the database, the order of the genes in the chimpanzee genome was different from that in the human and orangutan genomes. In the chimpanzee, genes were ordered, from the proximal to distal regions, as CYP2D8P-CYP2D6-CYP2D7, whereas in humans and orangutans they were ordered as CYP2D8P-CYP2D7P-CYP2D6. The comparison of the CYP2D6 nucleotide sequence from the genome with that of complete cds (Genbank accession number: DQ282164) revealed that the nucleotide sequence in a middle region of the genomic CYP2D6 in the database is different from the EST sequence (Fig. 2). We further performed PCR amplifications of the genomic DNA of the chimpanzee using specific primer pairs based on the annotated chimpanzee CYP2D6 and CYP2D7 sequences (Table 1). The sequences obtained with the CYP2D7- and CYP2D6-specific primers that are based on the NCBI annotation are orthologs to the human CYP2D6 and CYP2D7P, respectively. These results indicate that the CYP2D6 gene annotated in the reference genome of the chimpanzee should be in fact CYP2D7, and the predicted CYP2D7 must be in fact CYP2D6. This observation is strongly supported by an orthologous relationship of SINEs and LINEs between the chimpanzee and human genomes (Fig. 1).
In summary, here we examined gene organization in the CYP2D subfamily in primates, other vertebrates and invertebrate. Obtained results revealed three findings: first, the origin of this subfamily could be traced back to a stem lineage of amniotes and amphibians, second, the CYP2D6 and CYP2D8P in humans have been already present before divergence between New World monkeys and Catarrhini, and third the expansion of CYP2D genes seems to reflect or be affected by an environment. We hope that gene organization in the CYP2D subfamily is precisely investigated in other various species to confirm these findings. Future work including phylogenetic analyses and detection of gene conversions should elucidate further molecular evolution of this gene cluster.
The authors are indebted to the Max Planck Institute for Biology and the Primate Research Institute of Kyoto University for their kind contribution on sample collection. This work was supported by Grant-in-Aid for Scientific Research (B) (21370106).