Mitochondrial genomes and divergence times of crocodile newts : Inter-islands distribution of Echinotriton andersoni and the origin of a unique repetitive sequence found in Tylototriton mt genomes

Crocodile newts, which constitute the genera Echinotriton and Tylototriton, are known as living fossils, and these genera comprise many endangered species. To identify mitochondrial (mt) genes suitable for future population genetic analyses for endangered taxa, we determined the complete nucleotide sequences of the mt genomes of the Japanese crocodile newt Echinotriton andersoni and Himalayan crocodile newt Tylototriton verrucosus. Although the control region (CR) is known as the most variable mtDNA region in many animal taxa, the CRs of crocodile newts are highly conservative. Rather, the genes of NADH dehydrogenase subunits and ATPase subunit 6 were found to have high sequence divergences and to be usable for population genetics studies. To estimate the inter-population divergence ages of E. andersoni endemic to the Ryukyu Islands, we performed molecular dating analysis using whole and partial mt genomic data. The estimated divergence ages of the inter-island individuals are older than the paleogeographic segmentation ages of the islands, suggesting that the lineage splits of E. andersoni populations were not caused by vicariant events. Our phylogenetic analysis with partial mt sequence data also suggests the existence of at least two more undescribed species in the genus Tylototriton. We also found unusual repeat sequences containing the 3′ region of cytochrome apoenzyme b gene, whole tRNAThr gene, and a noncoding region (the T-P noncoding region characteristic in caudate mtDNAs) from T. verrucosus mtDNA. Similar repeat sequences were found in two other Tylototriton species. The Tylototriton taxa with the repeats become a monophyletic group, indicating a single origin of the repeat sequences. The intraand inter-specific comparisons of the repeat sequences suggest the occurrences of homologous recombination-based concerted evolution among the repeat sequences.


INTRODUCTION
Members of the genera Echinotriton and Tylototriton (family Salamandridae, order Caudata, class Amphibia) are generally called crocodile newts.Because crocodile newts seem to have retained the primitive morphology of old fossil newts, they are often referred to as "living fossils" and "primitive newt" (Estes, 1981;Zhang et al., 2008).Crocodile newts are distributed in South, Southeast, and East Asia.Due to their unique characters, they are popular as a pet in not only Asian countries but also Western nations.Yet sadly, over-hunting and environmental destruction have recently devastated the populations of some crocodile newt species.Along with this and also due to highly restricted distribution areas, three crocodile newt species have been listed as endangered (E.chinhaiensis is critically endangered and E. andersoni and T. hainanensis are endangered), and five Tylototriton species are also assigned as near threatened or vulnerable species on the IUCN Red List.At present, 11 nominal crocodile newt species are known (two Echinotriton and nine Tylototriton species; AmphibiaWeb).The phylogenetic relationships among these species have not been well elucidated.Yet last year, Stuart et al. (2010) performed a comprehensive molecular phylogenetic analysis using all nominal crocodile newts.In the paper, they described a new crocodile newt species (T.notialis) and suggested the occurrence of further undescribed species.Accordingly, they emphasized the hidden species diversity in this newt group.Although this finding and the tragic situation in crocodile newts will lead to further genetic surveys and prompted measures to conserve this newt group, usable information on the inter-and intra-population genetic divergence is not yet available.
Among crocodile newts, E. andersoni is endemic to six small islands of the Ryukyus, Japan.For some amphibian taxa distributed in the Ryukyu Island (e.g., ranid, microhylid, and rhacophorid frogs), their lineage splits were suggested to correlate with paleogeographic island segmentation events (i.e., vicariant lineage splitting) (e.g., Matsui et al., 2005).By contrast, Tominaga et al. (2010) revealed that the split of inter-island populations of the other newt, Cynops ensicauda, which is endemic to the Ryukyus, occurred before island segmentation.Thus, the split ages of inter-island E. andersoni populations seem to be beneficial for understanding the different lineage split modes between frogs and salamanders in island areas.To date however, estimations of the divergence age of E. andersoni populations have not been performed because only E. andersoni individuals from one island (Tokunoshima) have been used in molecular phylogenetic studies (Weisrock et al., 2006;Zhang et al., 2008;Stuart et al., 2010).
In multicellular animals, the mitochondrial (mt) DNA sequences are strong candidates for polymorphic markers for investigating genetic divergences and phylogenetic relationships.Animal mtDNA consists of a closed circular molecule typically 16 kb in size, and multiple copies of mtDNAs are included in every cell (Wolstenholme, 1992; but see Kurabayashi and Ueshima, 2000;Voigt et al., 2008).The small size and multiple copy number facilitate easy handling of this genome.The mt gene content is nearly identical across all multicellular animals, with 12S and 16S ribosomal RNA genes (12S and 16S), 22 transfer RNA genes (trns), and 13 protein encoding genes (ATPase subunits 6 and 8 [atp6 and 8], cytochrome oxidase subunits I, , cytochrome b apoenzyme [cob], and nicotinamide adenine dinucleotide dehydrogenase subunits 1-6 and 4L [nad1-6 and 4L]).Most vertebrate mt genomes contain one long non-coding region (approximately 500 bp-9 kb; Kurabayashi et al., 2008) called the control region (CR; also referred to as the D-loop region), which includes the signals for regulation of mtDNA replication and transcription (Wolstenholme, 1992).Another, short, non-coding sequence called the light-strand replication origin (O L ) has also been identified in most vertebrate mt genomes (Wolstenholme, 1992;Boore, 1999).It is widely accepted that the nucleotide substitution rates within the mt genes and CR are much faster than in the nuclear genome, and it is also generally known that the 37 mt genes and CR have different substitution rates with respect to each another (Kumazawa and Nishida, 1993;San Mauro et al., 2004).Because of these characteristics, such as easyhandling in experiments and fast and/or multiple nucleotide substitution rates, mt genomic sequences have been widely used in genetic and evolutionary studies, especially for closely related taxa.Nearly 70% of molecular phylogenetic studies on animal taxa have used mt gene data (Sato et al., 2005), and all molecular phylogenetic studies including the crocodile newt taxa employed mt gene data (e.g., Frost et al., 2006;Weisrock et al., 2006;Steinfartz et al., 2007;Zhang et al., 2007;Zhang et al., 2008;Stuart et al., 2010).Consequently, before this study, near complete mtDNA sequences had been available from four crocodile newts (E.andersoni [from Tokunoshima Island], E. chinhaiensis, T. asperrimus, and T. wenxianensis), and a partial mt sequence (approx.2.7 kb region from trnL-UUR to cox1) had also been usable for all nominal crocodile newt species.
In the present study, we sequenced the complete mt genomes of two crocodile newts, E. andersoni (from Okinawa Island) and T. verrucosus, in order to compare nucleotide divergences of the mt genes (and CR) among intra-and inter-species and identify mt genes with high sequence divergence.This information can be applied to inter-population genetic analyses in the crocodile newts.We additionally sequenced the 2.7 kb mt region of six crocodile newt taxa and estimated their divergence ages to check the correlation between the lineage splits and paleogeographic events.Further, we found unique repeat sequences between cob and trnP of the three Tylototriton species.This paper discusses the systematic significance and possible molecular mechanisms of the evolution of the repeat sequences.

MATERIALS AND METHODS
Specimens Experiments were performed using eight crocodile newt specimens.Two specimens of Echinotriton andersoni were collected from Okinawa and Amami Islands (Ryukyus, Japan).Six Tylototriton specimens, T. verrucosus, T. asperrimus, T. taliangensis, T. kweichowensis, T. shanjing, and T. cf.wenxianensis, which are distributed in China, were collected via the pet trade.Zhang et al. (2007) recognized that T. shanjing is conspecific with T. verrucosus (i.e., T. shanjing is a junior synonym of T. verrucosus) on the basis of minimal sequence divergence (average cob divergence is 1.2%), and Frost (2011) followed this view.However, we regarded these taxa as distinct species according to Stuart et al. (2010) and based on our results (the clade of each T. shanjing and T. verrucosus specimen was recovered, and average nucleotide divergences of their nad1 and 2 were found to be high [7.1 and 6.4%, respectively]).
For the specimen of E. andersoni from Okinawa Island and for that of T. verrucosus, we determined the complete mtDNA sequences.Partial mt genomic regions (see below) from the other specimens were also amplified and sequenced.
PCR and sequencing Total DNAs were extracted from a toe tip of each specimen using the DNeasy Tissue Kit (QIAGEN).From the total DNAs, the complete mt genomes of E. andersoni and T. verrucosus were amplified by long-and-accurate (LA) PCR.Resultant PCR fragments were sequenced by the primer walking method.The procedures for the LA-PCR and DNA sequencing were almost the same as those reported by Kurabayashi et al. (2006) and Kurabayashi and Sumida (2009).Sixty-seven primers were used for the LA-PCR amplification and primer walking; forty-four primers were newly designed and 23 primers were from previous studies (Kocher et al., 1989;Palumbi et al., 1991;Zardoya and Meyer, 1997;San Mauro et al., 2004;Kurabayashi and Sumida, 2009).Details of these primers are presented in Supplementary Table S1.
Heavy and Light (H and L) strands of the E. andersoni and T. verrucosus mt genomes were sequenced, and two mt genomic regions from trnL(UUR) to cox1 and from cob to trnP were PCR amplified and sequenced from the other six crocodile newts.The former region corresponds to the region used in the comprehensive molecular phylogenetic analysis of newts (Weisrock et al., 2006;Stuart et al., 2010).The latter region corresponds to a triplicated sequence region found in the T. verrucosus mtDNA (see below).The sequences determined here were deposited in GenBank/EMBL/DDBJ data libraries with accession numbers from AB689008 to AB689021.

Molecular phylogenetic analysis
To perform phylogenetic analyses, we produced two alignment datasets.One dataset (Aln-all) includes all 37 mitochondrial encoding gene sequences (13,519 nucleotide sites in total) from 50 caudates and a primitive frog, Ascaphus.Excluding the mt genomic data of two crocodile newts determined here, the other sequence data were referred from two previous studies (Zhang et al., 2008;Zhang and Wake, 2009).The other dataset (Aln-part) includes 60 caudate taxa covering all nominal crocodile newt species and con-sists of 2,497 nucleotide sites of nine trns (A, C, I, L-UUR, M, N, Q, Y, and W) and three protein encoding genes (nad1, nad2, and cox1).The sequence data not determined here were taken from Weisrock et al. (2006) and Stuart et al. (2010).In this dataset, Siren, of which the primitive phylogenetic position among caudates has been suggested (e.g., Zhang and Wake, 2009), was treated as the outgroup.To make these alignments, we initially aligned each gene portion using MUSCLE (Edgar, 2004) implemented in SeaView ver.3.2 (Galtier et al., 1996).The resultant alignments of protein encoding genes were revised by eye using amino acid alignments as the guide.For rrn and trn data, ambiguous alignment sites were removed using Gblocks ver.0.91b (Castresana, 2000) with a default parameter.We treated the portions of each of the two rrns, 13 protein encoding genes, and one combined sequence of the 22 trns as different partitions.Consequently, Aln-all and Aln-part had 16 and 4 partitions, respectively.The datasets used in this study are available upon request from authors.
Based on these alignment datasets, phylogenetic trees were constructed by the Maximum-Likelihood (ML) and Bayesian inference (BI) methods.For ML and BI analyses, partitioned models were applied.The most appropriate substitution model for each gene portion was estimated based on the Akaike and Bayesian Information Criteria (Akaike, 1974;Schwarz, 1978) implemented in Kakusan4 (Tanabe, 2011) for ML and BI analyses, respectively.In ML analyses, the parameters for nucleotide frequency and gamma distribution (G; with eight categories) were optimized using Treefinder program ver.Oct. 2008 (Jobb, 2008).The estimated best-fit models for each partition (only for Bayesian analyses) are shown in Supplementary Fig. S1.
ML analyses were performed using Treefinder.To evaluate node support, boot strap (BP) values were calculated with 1000 pseudo-replications.BI analyses were performed using MrBayes ver.3.1.2(Ronquist and Huelsenbeck, 2003).Two independent Markov chain Monte Carlo (MCMC) runs were conducted for 10 million generations (sampling frequency: one tree per 100 generations) for the two datasets.Parameter estimates and convergence were checked with Tracer ver.1.4 (Rambaut and Drummond, 2007; available from http://beast.bio.ed.ac.uk/Tracer/), and the first 10% of trees (1 million) were discarded for both Aln-all and Aln-part datasets.Node credibility of the BI tree was evaluated by Bayesian posterior probabilities (BPP).

Molecular dating
We performed divergence time estimation analyses using the Bayesian relaxed clock method.In the analyses, we used the Aln-all and Alnpart datasets and employed seven calibration points as follows.
(1) The split of Cryptobranchidae and Salamandroidae: ~151-170 million years ago (Ma), based on the fossil record of salamandroid-like Iridotriton (Evans et al., 2005) for the minimum bound and a proposed origin of caudate (Marjanović and Laurin, 2007) for the maximum bound.(2) The split of Cryptobranchidae and Hynobiidae: >145 Ma, based on fossil salamander Chunerpeton; according to Roelants et al. (2007) and Zhang et al. (2008), slightly younger date was applied compared with the original date suggested by Gao and Shubin (2003).( 3) The occurrence of the common ancestor of the Salamandridae: ~55-151 Ma, based on the fossil records of newt-like Koalliella (Estes, 1981) for the minimum bound and the salamandroid-like Iridotriton for the maximum bound.(4) The split of Tylototriton and Pleurodeles: >44 Ma, based on the Tylototriton related fossil, Chelotoriton (Milner, 2000).( 5) The split of Taricha and Notophthalmus: >23 Ma, based on a nearly complete fossil of Taricha (Estes, 1981).( 6) The split of Cynops and Paramesotriton: 15 Ma, based on a nearly complete fossil of Paramesotriton (Estes, 1981).( 7) The split of the Corsica-Sardinian taxon Euproctus and continental Triturus: ~20-30 Ma, based on an indirect biogeographic inference of disjunction of Corsica-Sardinia from the Iberian Peninsula (Zhang et al., 2008).These calibration points are identical to those used by Zhang et al. (2008).To perform dating analysis, we applied two distinct reference topologies for the Aln-all dataset: one topology is that of the ML tree from the dataset and the other is modified from the ML tree to include the Cynops clade and the most basal position of Salamandrina among Salamandrid taxa according to Zhang and Wake (2009) (Supplementary Fig. S2).For the Aln-part dataset, we also used two reference topologies (Supplementary Fig. S2): these topologies have the ML or BI tree topology for the crocodile newt taxa, but have the modified topology for the deep caudate branchings referred from a previous mitogenomic tree reconstructed by Zhang and Wake (2009), due to the low resolution of this dataset when used to determine deep phylogeny.Consequently, four dis-tinct dating analyses were performed (Analyses 1-4; Supplementary Fig. S2).In the divergence time analyses, we firstly estimated the Hessian matrix by using the BASEML program implemented in PAML ver. 4 (Yang, 2007); and then, the divergence times were estimated using Multidivtime program package (Thorne and Kishino, 2002).The gene partitions applied in the dating analyses were basically the same as those in the phylogenetic analyses.Yet the cox1 and nad2 partitions of the Aln-part dataset were combined, because the cox1 portion is very short (27 bp) and the short length data did not allow us to estimate the variance-covariance matrix for this portion.The F84 + G nucleotide substitution model, which is the most complicated model usable in Multidivtime, was applied for each partition.For each analysis, MCMC runs were conducted for 10 million cycles with one per 100 sampling and 10% burn-in.

Genome organization of Echinotriton andersoni and
Tylototriton verrucosus mtDNAs We determined the complete nucleotide sequences of the mtDNAs of Echinotriton andersoni (Okinawa Island specimen) and Tylototriton verrucosus.Their genomic organization is shown in Fig. 1.
The mtDNA of E. andersoni (from Okinawa) is 16,268 bp in length and contains 37 genes (13 protein-encoding genes, 12S and 16S rrns, and 22 trns) that are typically found in animal mt genomes.There is one major noncoding region (736 bp) between trnP and trnF in this genome.Upon determining that this region contained a termination-associated sequence (TAS) as well as conserved sequence blocks II and III (CSBs II and III) (Supplementary Fig. S3), common features of the control region of the vertebrate mtDNAs, we identified this region as the control region (CR, alternatively known as the D-loop region).We also found the Light strand replication origin (O L ) at the gene boundary of trnN-trnC.The O L s have a potential to form a hairpin secondary structure with a 12 bp stem and 13 base loop (three nucleotides overlap with the downstream encoded trnC) and possess a 'CCGGC' consensus sequence of the vertebrate O L s ('GCCGG' in L-strand representation; e.g., Zhang et al., 2003b;San Mauro et al., 2004) at the 3′ outside of the stem.We also found a 121 bp noncoding region at the gene boundary of trnT-trnP in the E. andersoni mt genome.The noncoding sequence between trnT and trnP is not general in animal mtDNAs, but this noncoding region is a common feature in the caudate mtDNAs (e.g., Zhang et al., 2008), although the function of this region has been well investigated (see below; Zhang et al., 2003a).
Comparing the mt genome of E. andersoni (Okinawa) with that of the conspecific specimen from Tokunoshima Island (16,275 bp: Zhang et al., 2008), both have the same gene composition and gene arrangement.Their nucleotide similarity is 94.8% throughout the genome, and there are 24 indel sites (10 indels in 16S, 6 in the loop structures of trnD, H, S(AGY), and S(UCN), 5 in the noncoding region between T-P trns, 2 in CR, and 1 in O L ).
As in the E. andersoni case, the T. verrucosus mtDNA possesses typical features of vertebrate and caudate mtDNAs.This genome is 17,100 bp in length and contains the typical 37 mt genes, one O L and one CR (737 bp).This genome also has a noncoding sequence (186 bp) downstream of the trnT, similar to other caudate mtDNAs.Beside these general features, the T. verrucosus mtDNA shows unusual 439 bp repeat sequences consisting of 3′ cob (185 bp) -trnT (68 bp) -noncoding sequence (186 bp) (Rep1 and Rep1′; Figs. 1 and 2).The repeats occur directly and in tandem, and the nucleotide sequences of these repeats are almost the same (Fig. 2 and Supplementary Fig. S4).Further, an additional incomplete repeat directly follows the two repeat sequences.This additional repeat (named Rep2) is 324 bp in length and contains the 3′ cob like sequence (164 bp), pseudo trnT (60 bp), and a noncoding sequence (100 bp).The Rep2 has 66.2 and 68.4% nucleotide similarity with respect to Rep1 and Rep1′, respectively, but the first 57 bp (corresponding to 3′ cob sequence) of Rep2 shows more similarity with the counter portion of Rep1 and Rep1′ (87.7 and 98.2%).Consequently, the T. verrucosus mtDNA contains one complete cob and two additional 3′ cob like sequences and two complete trnTs, one pseudo trnT, and three noncoding sequences at the 5′ side of trnP (Fig. 1).Tandem repeat sequences within the noncoding region between trnT and trnP have been reported for some caudate mtDNAs (Andrias, Mertensiella, and Ranodon: Zardoya et al., 2003;Zhang et al., 2003aZhang et al., , 2003b).Yet the tandem repeats including not only a part of the noncoding sequence but also the upstream trnT and cob have not been reported, and the direct repeats involving the complete trnT found in the T. verrucosus mtDNA are the first findings from caudate mtDNAs (non-tandem duplication of mt trnT from a Fig. 2. Repeated regions found in three Tylototriton species.Detailed information of the repeated regions is shown.Abbreviations of gene and region names are the same with those in Fig. 1. plethodontid salamander, Plethodon elongates is known; Mueller and Boore, 2005).
To survey the occurrence of the repeats of the cobnoncoding region in other crocodile newts, we amplified the cob-trnP region from six crocodile newts, E. andersoni (from Amami Island), T. asperrimus, T. taliangensis, T. cf.wenxianensis, T. kweichowensis, and T. shanjing.The resultant PCR fragments from the former four specimens showed similar length (approx.550 bp) to that of the nonduplicated fragment of E. andersoni (from Okinawa).By contrast, the fragments from T. kweichowensis and T. shanjing were longer than those of the fragment without repeats.Sequencing analysis revealed that both T. kweichowensis and T. shanjing as well as in T. verrucosus have tandem duplicated sequences in the cob-noncoding region (Fig. 2).It should be mentioned that the taxa possessing the repeat sequences of the cob -trnT -noncoding region form a monophyletic group (see below).The systematic significance and possible evolutionary mechanism of the repeat sequences in Tylototriton mtDNAs are discussed below.

Phylogeny of caudates and crocodile newts
To confirm the phylogenetic position of crocodile newts among caudates and make reference tree topologies necessary for the following molecular dating analysis, we performed molecular phylogenetic analysis using an alignment dataset of whole mt genomic sequence (consisting of 13,519 nucleotide sites from 51 OTUs).The resultant ML tree (-log likelihood = 289975.8)is shown in Fig. 3, and the BI tree recovered almost the same topologies, with only one exception; the position of Pachytriton brevipes.Although several different relationships for caudate families have been proposed based on partial mt gene data or combined data of partial mt and several nuclear genes (e.g., Weisrock et al., 2005;Roelants et al., 2007), family level relationships are supported by high statistic values (MLBP = ~85-100, BPP = 1.0), and they are congruent with a previous mitogenomic tree reconstructed by Zhang and Wake (2009).
It is known that crocodile newts (Echinotriton and Tylototriton) belong to the subfamily Pleurodelinae (newts) of the family Salamandridae, and the genus Pleurodeles is their sister taxon.These three genera are occasionally referred to as "primitive newts" (Estes, 1981;Zhang et al., 2008).Our mitogenomic tree showing the (Echinotriton + Tylototriton) + Pleurodeles clade and the initial split of this clade among newts confirms the tradi-Fig.3. Mitogenomic tree of caudates.ML tree is shown here.The different topology from the Bayesian tree is shown in the box.Bootstrap support (BP) and Bayesian posterior probability (BPP, ** > 99 and * >95) of each node are denoted.Excluding E. andersoni (from Okinawa island) and T. verrucosus, the mtDNA data used were taken from the previous studies (see MATERIALS AND METHODS).Mitochondrial genomes of crocodile newts tional phylogenetic 3).The inter-generic and inter-subfamilial are almost congruent those based on whole and partial mtDNA data (Zhang et al., 2008;Weisrock et al., 2006).However, the position of the Salamandrina (subfamily Salamandrininae) different between our trees and those of Zhang et al. (2008), where this genus is the sister taxon of all other the most primitive salamandrid).our tree suggests paraphyly of genus Cynops, which is not the case the two previous studies (where Cynops monophyly is suggested).Statistical AU and KH tests, could not reject these alternative hypotheses (the p in the AU and KH tests are 0.383 and 0.235 for the primitive Salamandrina position, and 0.446 and 0.330 for Cynops monophyly, respectively).Thus, we used two reference tree topologies (ML topology and a modified ML topology with the primitive Salamandrina position and the Cynops monophyly) in the molecular dating analyses with the whole mt genomic data (Supplementary Fig. S2).
We performed additional analyses more focused on crocodile newt phylogeny using an alignment dataset consisting of 2,497 nucleotide sites from 60 OTUs, including all nominal crocodile newt species (11 species).The dataset consists of a partial mtDNA region (including nad1, nad2, cox1, and 9 trns), which has been employed in previous studies on salamandrid and crocodile newt phylogenies (Weisrock et al., 2006;Stuart et al., 2010).The resultant ML tree from the dataset is shown in Fig. 4. In the tree, deep phylogenies (at family level) of caudates are not well resolved (BPs < 50), maybe due to the small number of informative sites and fast nucleotide substitution rates in the employed genes in the dataset (thus, we basically used the ML tree topology for the deep relationships in the following molecular dating analyses; see below).The ML tree recovered the primitive newt clade [(Echinotriton + Tylototriton) + Pleurodeles].The BI tree from the same dataset recovered a very similar tree topology to that of the ML tree.Yet, within the Tylototriton taxa, the BI tree shows three uncertain relationships, one "trichotomy" (among T. kweichowensis, T. shanjing, and T. verrucosus) and two incongruent topologies compared with the ML tree tions of vietnamensis T. notialis) 4).Excludthese uncertain the crocodile newts are well resolved they are supported by very high statistic values (MLBP ≥ 95, BPP =100).Our trees are also well congruent those of Stuart et al. (2010), as the uncertain relationships are also unresolved in their trees.
In our trees 4), Tylototriton taxa are divided into two major clades.One clade to the T. asperrimus (subgenus Yaotriton; Dubois and Raffaëlli, 2009), which includes T. asperrimus, T. wenxianensis, notialis, T. hainanensis, and T. vietnamensis.T. cf.(found by Weisrock et al., 2006;Stuart et al., 2010) and T. cf.wenxianensis (the present study), of which species affiliations are uncertain, are also nested in this group.Because all nominal Tylototriton species are included in our study because these uncertain taxa did not show close affinity to other nominal Tylototriton species, our result clearly indicates the presence of two undescribed species in this genus.Another major Tylototriton clade corresponds to the T. group Tylototriton) consisting of T. T. kweichowensis, T. verrucosus, and T. shanjing.In this group, T. taliangensis is initially divided, and the other three species become a monophyletic group (MLBP = 100, BPP = 1.0).It remarkable that these three species commonly have repeat sequences of the -noncoding region.
Of the three individuals of E. individuals from Amami and Tokunoshima Islands are closer related than from Okinawa Island, seemingly reflecting the physical distances among the islands but not the paleogeographic history of island formation (see below).

ages of crocodile newts and of distribution
Based on the two alignment datasets and Aln-part), we performed molecular dating analyses with the Bayesian relaxed clock method.In the analyses, we employed reference tree topologies for each consequently, four dating were The divergence ages of crocodile newts A and and detailed divergence ages of are available in Supplementary Fig. and In same dataset, analyses with different reference topologies estimated very similar divergence ages throughout the caudate taxa, indicating that the reference tree topologies employed here did not really affect the resultant divergence ages (Supplementary Fig. S2 and Supplementary Table S2).Whole and partial datasets also estimated similar divergence ages of a wide range of caudate taxa, but the latter dataset estimated slightly younger ages for some divergences of crocodile newts (Fig. 5B).In our analyses, the split ages of the crocodile newt lineage from the sister taxon (Pleurodeles) and the split of Echinotriton and Tylototriton (i.e., the last common ancestor of crocodile newts) are estimated to have occurred around 45 and 25 Ma, respectively (Fig. 5,  A and B).These dates are largely congruent with the ages estimated in a previous mitogenomic study ( ~45-52 Ma and ~27-33 Ma;Zhang et al., 2008).
One of the aims of this study is to examine the divergence ages of Japanese crocodile newts (E.andersoni) endemic to the central Ryukyu Islands.It is known that the Ryukyu archipelago was once a continuous landmass connected to the Eurasian continent until the Pliocene or lower Pleistocene, and also that the islands of the central Ryukyus (including Okinawa, Amami, and Tokunoshima) formed a super island until the middle or upper Pleistocene (Fig. 5C; Kizaki andOshiro, 1977, 1980;Hikida and Ota, 1997;Ota, 1998).The paleogeographic history is considered to have affected the lineage splits and distributions of land animals including amphibians (e.g., Ota, 1998;Matsui et al., 2005;Nishizawa et al., 2011;Kuramoto et al., 2011).Actually, the divergence ages between inter-island populations (or sister related species distributed in the different islands) of some frogs endemic to the Ryukyus were roughly congruent with the vicariance ages of the corresponding islands (e.g., Odorrana, Babina, Fejervarya, Buergeria, and Microhyla: Matsui, 2005;Matsui et al., 2005;Nishizawa et al., 2011;Kuramoto et al., 2011).In our analyses, the split age of E. andersoni from the sister taxon, E. chinhaiensis (only known from Zhejiang province, Eastern China), is estimated around ~17-18 Ma (Fig. 5, A and B).This age, before disruption of the land connection between Eurasia and the Ryukyus, suggests that the ancestral stock of E. andersoni entered the ancient super island from Eurasian via the land connection in the same manner as the Ryukyu frog taxa.Yet, this age is much larger than the vicariance age of Eurasia and the Ryukyus and indicates that the vicariant event was not a cause of a lineage split of these Echinotriton taxa.Among E. andersoni individuals, ~7-9 Ma and 1.5 Ma split ages are estimated between Okinawa and the Amami-Tokunoshima islands and between Amami and Tokunoshima islands, respectively.These ages are obviously older than the vicariance ages of the islands (Fig. 5C).These older split ages suggest that segmentation of the islands was not the cause of the lineage splits in E. andersoni.Rather, it seems that some E. andersoni populations without gene flow had formed in the continuous landmasses before the formation of the present Ryukyu Islands.This situation is different from the cases of the frog taxa.Yet, Tominaga et al. (2010) showed that in the other newt species endemic to the central Ryukyus, Cynops ensicauda, the split of Okinawa and Amami populations occurred preceding island segmentation (~3.3-5.2Ma).The difference in the lineage split modes between the newts and the frogs may be attributable to the low dispersal ability known in newts (e.g., Joly et al., 2001).Because dispersal ability determines the potential for gene flow between local populations (e.g., Parsons, 1997), the low dispersal capacity in newts would easily lead to a reduction of gene flow among the populations in Cynops and Echinotriton and might cause lineage splits among these populations without obvious geographic barriers like island formation.
The split age of two major Tylototriton clades (corresponding the subgenera Yaotriton and Tylototriton) is estimated to be around 16 Ma 5, A and B).In the subgenus Tylototriton clade, the occurrence of the last common ancestor of the group possessing the unique cobnoncoding repeats (T.kweichowensis, T. verrucosus, and T. shanjing) is estimated to be around 8 Ma, and this group split from the taxon T. taliangensis, in which mtDNA does have the repeats from ~11 Ma.Thus, duplication the cob-noncoding region seems to have arisen in the common ancestral lineage of this group during ~8-11 Ma.
Origin and possible evolutionary mechanism of the tandem-repeat sequence found in the Tylototriton mt genomes As mentioned above, we found the unique tandem repeat sequences consisting of the 3′ cob-trnTnoncoding region from the mtDNAs of T. kweichowensis, T. shanjing, and T. verrucosus (Fig. 2).The former two species have two repeat units and the repeats within each species are divergent (Rep1 vs. Rep2: 79.1 and 64.6% in T. kweichowensis and T. shanjing, respectively).T. verrucosus has three repeats.Two of the three repeat units show very high sequence similarity (98.2% between Rep1 and Rep1′), but another (Rep2) is divergent from the others (Rep2 vs. Rep1 and 1′ = 66.2 and 68.4%, respectively).Two repeat sequences (Rep1 and 2) commonly observed among these species seem to have originated in an ancestral lineage of these species (lineage M-N in Fig. 5A) and were likely caused by a single duplication event (during ~8-11 Ma; Fig. 5B), because (1) the same mt region is commonly repeated among these species, and (2) these species are monophyletic, and similar repeats are not found in other Tylototriton taxa.An additional duplication event would explain the 3rd repeat unit (Rep1′) in a recent ancestor of T. verrucosus, and therefore Rep1′ retains high nucleotide similarity with the original sequence (Rep1).
If the single origin of the two repeats is correct, it is theoretically expected that the Rep1 and Rep2 divergences will be nearly identical among species, because the repeats experienced the same evolutionary time from the common ancestor.This theory is seemingly maintained between T. shanjing and T. verrucosus, of which Rep1 and Rep2 sequences show similar divergence values (uncorrected p value, 64.6%: Rep1 vs. Rep2 in T. shanjing; 66.2 and 68.4%: Rep2 vs. Rep1 and Rep1′ in T. verrucosus; Supplementary Table S3).However, χ-squared and Fisher's exact tests clearly reject the homogeneity of the nucleotide divergence between these species (mainly due to numerous indels in the T. shanjing repeats; Supplementary Fig. S4).In addition, the nucleotide divergence of the T. kweichowensis repeats (79.1%) is obviously inhomogeneous with respect to those of the former species (Supplementary Table S3).
The difference in nucleotide divergence of the two repeats among the three Tylototriton species could have resulted from differences in the frequency, period, and/or portions of sequence homogenization events, called concerted evolution.Concerted evolution is well known in multiple copied genes at the nuclear genome (e.g., ribosomal RNA gene cluster; Futuyma, 2005) and also in duplicated control regions (CRs) of animal mt genomes (Kurabayashi et al., 2008 and references therein).Specifically, the two repeats of T. kweichowensis possessing relatively high sequence similarity (79.1%) have experienced more frequent and/or recent sequence homogenization events compared with those in the other two species (with about 65% sequence similarity); and different homogenized portions would have led to the different sequence divergences of the two repeats between T. verrucosus and T. shanjing.Specifically, in the T. shanjing Rep2, almost all of the trnT sequence where sequence homogenization would not have occurred has been deleted (Supplementary Fig. S4).
Evidence of the actual occurrence of this concerted evolution is found in three repeats of T. verrucosus.As indicated above, the T. verrucosus Rep1 and Rep1′ have almost the same nucleotide sequence.There are only eight nucleotide substation sites (among 439 nucleotide sites) between these repeats, and seven of the eight sites are concentrated in the initial 57 nucleotide sequence of these repeats (Fig. 2 and Supplementary Fig. S4).By contrast, the 57 bp sequence of Rep1′ is almost identical to the counterpart of Rep2 (only one substitution site), although the other parts of Rep1′ and Rep2 are considerably divergent (62.8% in the comparable 128 bp sequence).This bizarre situation indicates a recent occurrence of a concerted evolution event between Rep1′ and Rep2.Specifically, the 57 bp sequence of Rep1′ (originally had almost the same sequence with that of Rep1) was replaced by that of Rep2 (already divergent from those of Rep1 and 1′) by a recent concerted evolution event.Consequently, only the 57 bp sequence portion of Rep1′ was homogeneous with the Rep2 sequence, while the remaining Rep1′ portion remained almost identical to the sequence of Rep1.Two alternative concerted evolution mechanisms have been suggested in animal mt genomes: homologous recombination and illicit DNA replication accompanied by nascent strand slippage and a loop out of an extra-copied region (Kumazawa et al., 1998).The homologous recombination seems to be commitment in the concerted evolution of the Tylototriton repeats, because the latter mechanism is difficult to cause and/or maintain the multiple and noncontiguous homologous sequences among multiple copied regions (as in the case of the 57 bp sequences of Rep1′ and 2 and the other sequence portions of Rep1 and 1′; Kurabayashi et al., 2008).
A long noncoding region (ranging from ~89-885 bp) between trnT and trnP is commonly found in all caudate mt genomes currently available.The noncoding region has repetitive sequences in some caudates (e.g., Zhang et al., 2003a).Zardoya et al. (2003) mentioned that the occurrence of the repeats might be related to the close proximity of the noncoding region to the H-strand replication origin embedded in CR (located 3' next to trnP).Similarly, we postulated a molecular mechanism (highfrequency recombination mediated by replication folk barrier) to explain the high-frequency occurrences of concerted evolution between duplicated CRs and their 5' flanking regions (Kurabayashi et al., 2008).The Tylototriton repeats involving the noncoding region and the occurrence of concerted evolution between the repeats seem to match well the considerations of both Zardoya's and our own.Finally, the noncoding region harbors a sequence of a possible stem-loop structure in some caudates (Zhang et al., 2003a).Yet the noncoding regions in the crocodile newts did not have any possible stem-loop structures; instead, they commonly have a pentanucleotide motif "CCGGG" (Supplementary Fig. S5), and this motif is conserved in a wide range of caudate taxa (data not shown).Although the biological function of the trnT-trnP noncoding region has not been elucidated (Zhang et al., 2008), this conserved motif might play a role in caudate mtDNAs.

Nucleotide diversity of crocodile newt mt genes
One of the aims of this study was to find mt genes with high nucleotide divergence, genes which will be useful as genetic markers among the populations of the crocodile newts, including three endangered species (E.andersoni, E. chinhaiensis, and T. hainanensis).Thus, we compared the nucleotide sequences of all mt genes (two rrns, 13 protein-coding genes and a concatenated sequence of 22 trns) and the control region among six crocodile newt taxa currently available (Fig. 6).
The average nucleotide diversity of the mt genes and CR varied from 4.4-14.0%(12S to nad3; Fig. 6).The rrns and trns have small nucleotide divergences, while those of protein encoding genes are large.Interestingly, it is generally known that the CR is the most variable mtDNA region, and thus this region is often used as a polymorphic marker for closely related taxa including intra-species populations and/or individuals (e.g., Avise, 2004).In the crocodile newt mtDNAs, however, CR is the second-most conserved of the mt regions, and its average divergence (5.1%) is lower than that of trns and 16S (5.6 and 5.7%).Similar conservative CRs were reported from ambistomatid salamanders (Samuels et al., 2005), suggesting that the CR is not a good candidate for a polymorphic marker in caudates.It should be noted that the 3% sequence divergence of 16S is often used as a species threshold value in frog taxa (Fouquet et al., 2007;Vieites et al., 2009).The 16S of the inter-island individuals of E. andersoni (from Okinawa and Tokunoshima Islands) show almost a threshold value (2.9%) which is almost congruent with their old split age (~7-9 Ma).
Among the mt protein-coding genes, cox1, cox3, and cob have been reported to have high nucleotide substitution rates in plethodontid salamanders and their related taxa (Mueller, 2006).Yet, in crocodile newt mtDNAs, these genes are relatively conservative (~10.5-11.2%).Rather, the NADH dehydrogenase subunit and ATPase genes (nad1-6 and atp6) have large nucleotide divergences.These genes, especially nad3 and nad4, the most and second-most variable genes, seem to be candidates of a suitable polymorphic marker for crocodile newt taxa.If we look only at the E. andersoni populations, cob and cox3 also seem to be good candidates because the divergence of these genes among the inter-island individuals (6.8 and 6.7%) is similar or higher than the corresponding sequence divergence of the above candidate genes (5.5% in nad5: ~9.3% in nad3).

Conclusion
This study revealed the ancient origin of inter-island populations of the endangered species E. andersoni and their high genetic diversity compared with the inter-specific level.In the conservation context, these results suggest that at least the three inter-island populations (Okinawa, Amami, and Tokunoshima Islands) should be regarded as distinct conservation units and emphasize the necessity for further genetic surveys on inter-and intra island populations of this species.The divergence of the crocodile newt mt genes shown here would be applicable for similar studies on not only E. andersoni but also other endangered newt groups.Further, as shown in this study, the phylogenetic relationship and cryptic species diversity of the genus Tylototriton are not completely understood.Information on the nucleotide divergence of the mt genes and the unique repeat sequences symapomorphic to part of Tylototriton species also seems to be beneficial to future phylogenetic and taxonomic studies on this crocodile newt group.
We are grateful to the Boards of Education of both Kagoshima and Okinawa prefectures for allowing us to collect living crocodile newts protected by low.The present study was supported by a Grants-in Aid for Scientific Research from the Japan Society for the Promotion of Science (No. 20510216 to M. Sumida).Fig. 6.Box plot of nucleotide divergences of each mitochondrial gene and control region.Each box plot contains the nucleotide divergences of each mt gene or CR among six crocodile newt taxa so far available (see Fig. 3).Abbreviations of gene names are shown the text.Box plots show a line at the median divergence, a surrounding box containing the middle 50% (25-75%) of the data, and whiskers with 1.5 inter-quartile range.The values in the boxes show average nucleotide divergences of the mt genes (and CR).The genes and CR are seriated from the smallest average divergence to the largest.Dots with parenthetic values indicate inter-island divergences of E. andersoni (Tokunoshima vs. Okinawa islands).

Fig. 1 .
Fig. 1.Genome organization of two crocodile newts' mtDNAs.The transcriptional direction of H-strand encoding genes and upstream and downstream notations used in this paper are shown by an open arrow and open arrowheads, respectively.The H-and L-strand encoded genes are denoted above and below in each gene box, respectively.The sizes of the boxes do not reflect actual gene length.Transfer RNA genes (trns) are designated by single-letter amino acid codes.L 1 , L 2 , S 1 , and S 2 indicate trns for Leu(UUR), Leu(CUN), Ser(AGY), and Ser(UCN), respectively.The trnT box with "ps" indicate the pseudogene.Control region is abbreviated as CR.O L indicates the region of L-strand replication origin including a typical stem-loop structure.Other genes are abbreviated as follows: 12S and 16S, 12S and 16S ribosomal RNAs; cox1-3, cytochrome c oxidase subunits 1-3; cob, cytochrome b; nad1-6 and 4L, NADH dehydrogenase subunits 1-6 and 4L.Rep1, 1', and 2 indicate the repeats, and Rep1 and Rep1' have almost identical nucleotide sequence.

Fig. 4 .
Fig. 4. Salamandrid phylogeny inferred from trnL(UUR)-cox1 sequence.The ML tree is shown here.The distinct tree topology of crocodile newts from the Bayesian tree is indicated in the box (only one represent is shown in each species).Bootstrap support (> 50 only) and Bayesian posterior probability (BPP, ** > 99, * > 95, and -< 95) are shown."tri" and "dt" indicate the node condition appeared in the Bayesian tree: the former designates trichotomy branchings and the latter shows different topologies from the ML tree including the Tylototriton topology above mentioned.Three Tylototriton species of which mtDNAs have the cob-NC repeats become monophyletic group.The taxa of which trnL(UUR)-cox1 were sequenced in this study are shown by bold.
5. Divergence ages of crocodile newts and split ages ofRyukyu A. Time tree of crocodile newts resulted from analysis The node numbers (A-O) correspond to of B. Divergence ages of crocodile newts estimated from four distinct analyses.NA indicates not applicable due to lacking taxa distinct reference topology.C. of Ryukyu Archipelago.The paleogeographic split ages of Ryukyu Islands are shown.