Edited by Yoko Satta. Nobuyoshi Shimizu: Corresponding author. E-mail: shimizu@dmb.med.keio.ac.jp |
The genome size of the organism is highly diverse among species (Gregory, 2005), and the genome size diversity is found even between very closely related species (Wendel and Cronn, 2003; Hickey and Clements, 2005; Boulesteix et al., 2006). This phenomenon is classically referred to as “C-value paradox” (Thomas, 1971), which represents the discrepancy between the amount of genome DNA and developmental complexity of the organism. Previous studies attempted to clarify this paradox by focusing on the adaptive significance of the relationship between genome size and biological traits such as cell size, metabolic rate, and longevity (Chipman et al., 2001; Griffith et al., 2003; Cavalier-Smith, 2005; Hughes and Piontkivska, 2005). However, little or no causal links were found with biological traits and the paradox remained unsolved.
In contrast to biological traits, the genomic DNA sequences of various species can be directly compared to find-out what types of nucleotide sequences have increased or decreased during evolutionary time span. In the eukaryote genome, the amount of coding sequences is much smaller than the non-coding sequences, and hence the latter should have exerted greater influence on the genome size change. In general, changes in the non-coding sequences occur mainly by insertions and deletions (indels) of small nucleotide sequences or amplification of repetitive elements. In fact, the small indels were considered as a major driving force of genome size evolution (Petrov et al., 1996; Petrov, 2002a) and the rate of DNA loss through accumulation of small deletions was emphasized as a major driving force for the genome to shrink (Petrov, 2001; Petrov, 2002b). However, those studies utilized somewhat limited sequences such as transposons and pseudogenes to investigate indel bias, and hence the information was insufficient to clarify the process how genomic architecture changed along with genome size evolution.
Recently, amplification of the repetitive elements has received more attention as another driving force (Kidwell, 2002; Neafsey and Palumbi, 2003; Boulesteix et al., 2006) because repetitive elements occupy a significant portion of the eukaryote genome, as evidenced for human (International Human Genome Sequencing Consortium, 2001) and other organisms (SanMiguel et al., 1996). As an exception, pufferfishes contain minute amounts of repetitive elements, having the smallest genome (~400 Mb) among vertebrates (Crollius et al., 2000; Aparicio et al., 2002).
However, it is still unknown how genome sizes expand or shrink by changing the amounts of small indels and repetitive elements. One way to answer this question is to directly compare DNA sequences among appropriate species. Thanks to the genome sequencing projects, enormous amounts of genomic DNA sequences are now available for various species, especially mammals (Thomas et al., 2003; Chapman et al., 2004). However, there are no significant differences in the genome sizes among mammalian species (human 3.4 Gb, chimpanzee 3.7 Gb, mouse 3.3 Gb, rat 3.0 Gb, and cow 3.6 Gb; calculated from the data in Animal Genome Size Database at http://www.genomesize.com (Gregory, 2005)). Interestingly, the situation is different in fish species. Two pufferfishes, Takifugu rubripes (Aparicio et al., 2002) and Tetraodon nigroviridis (Jaillon et al., 2004) have almost equal size of genome (400 Mb), however medaka has 2-fold bigger genome (800 Mb) and zebrafish has 4-fold bigger genome (1700 Mb), showing the genome size diversity. Furthermore, substantial amounts of DNA sequences are available for these fishes and this was considered advantageous for investigating the genome size evolution. For meaningful comparison, it is essential to select proper species that have high degree of homology in the genomic DNA sequences. It may not be feasible to compare genomic DNA sequences between fish and mammals because these two lineages show low degree of homology except coding sequences and regulatory elements (Goode et al., 2003; Thomas et al., 2003). On the contrary, medaka and pufferfishes exhibit high degree of sequence homology despite 2-fold genome size difference. Thus, we considered medaka and Takifugu as an ideal combination to evaluate effects of indels and repetitive elements on the genome size evolution. Ohtsuka et al. (2004) compared the 229 kb medaka sequence with Takifugu, human and mouse, however, it was a gene-poor region and analytical methods used were not sufficient to analyze amplification of repetitive elements and compilation of indels.
In this study, we utilized approximately 1 Mb genomic DNA sequence of medaka chromosome LG22 (Sasaki et al., 2004; Shimizu et al., 2006; Sasaki et al., 2007) and the corresponding sequence of Takifugu genome (Aparicio et al., 2002) to evaluate the basis for genome size difference and the genomic architecture.
For the medaka LG22 DNA sequence, we filled sequence gaps in BAC clones and determined a contiguous sequence for precise comparison of genomic DNA sequences. For the present study, we selected a particular 1 Mb sequence consisting of five BAC clones (Md0172F16, Md0159H14, Md0170F19, Md0147C05, and Md0200E16) from the Medaka BAC Library (Matsuda et al., 2001). These clones were sequenced with a 3730xl DNA Analyzer and a 3100 Genetic Analyzer (Applied Biosystems) as described previously (Kawasaki et al., 1997). DNA sequence assembly was performed using the Phred/Phrap/Consed program (Ewing and Green, 1998; Ewing et al., 1998; Gordon et al., 1998) and sequence gaps were filled by primer walking.
The coding sequence was analyzed with BLASTN, (Altschul et al., 1990) against nr database in NCBI and the medaka EST database (Naruse et al., 2004; The TIGR Gene Index Databases, The Institute for Genomic Research, Rockville, MD 20850 http://www.tigr.org/tdb/tgi; Heinz Himmelbauer unpublished data). GENSCAN was utilized for gene prediction (Burge and Karlin, 1997). To determine orthologous genes between medaka and human, whose genome is annotated most precisely in the sequenced vertebrate species so far, putative genes predicted by GENSCAN were analyzed by BLASTP against human genes in the public database. Genomic structures of medaka genes identified with human orthologous genes were determined by Wise2 (available at http://www.ebi.ac.uk/Wise2/). Finally, the exons were determined by the est2genome program (Mott, 1997) and exon-intron boundaries of each medaka gene were confirmed with the DOTTER program (Sonnhammer and Durbin, 1995). To mask repetitive elements in the medaka genome sequence, we developed a Medaka Repeat Database (ver.1.0 available at http://biol1.bio.nagoya-u.ac.jp:8000/). For this, we utilized fish repetitive elements (T. rubripes, T. nigroviridis, Lepidiolamprologus elongates, and Danio rerio) from the public database giri (http://www.girinst.org/~server/repbase.html) and repetitive elements previously found (Naruse et al., 1992, Koga et al,. 2002, Matsuo and Nonaka 2004) and newly found in 19 Mb of the medaka genome sequence of LG22. The 6914 entries of repetitive elements were classified into 6 categories (“LTR”, “LINE”, “SINE”, “DNA transposon”, “Simple Repeat and Low Complexity”, and “Unclassified”) based on their structures or the homology with the known repetitive elements. Using this database, repetitive elements were identified with the RepeatMasker2 program (Smit, A. F. A. and Green, P. RepeatMasker at http://www.repeatmasker.org).
The Takifugu genome sequence corresponding to the medaka 1 Mb-sequence was basically searched from the database at the Joint Genome Institute (JGI), Takifugu rubripes ver. 3.0 (Aparicio et al., 2002) and 5 scaffolds (Scaffold940, 1291, 788, 3768, and 183) were identified by BLASTN. For Scaffold183 we utilized a more accurate sequence registered in Genbank (accession number AF411956). Takifugu sequences were masked with the Medaka Repeat Database using RepeatMasker2.
We employed BLASTZ (Schwartz et al., 2000), a pairwise alignment tool using local alignment methods. The program was downloaded from http://pipmaker.bx.psu.edu/pipmaker/ and applied locally with the parameters B = 0, C = 2, H = 2200, T = 0 and W = 6. The result of alignment was visualized with PipMaker.
To compare genomic sequences in detail, we classified them into aligned and unaligned regions based on the sequence alignment of the two genomic sequences by BLASTZ. The unaligned sequences were divided into two classes. One comprised “indels”, for which boundaries could be clearly assigned within the aligned region (Fig. 1a) but for which it was difficult to determine whether they were insertions in medaka and deletions in Takifugu or vice versa. Therefore, we defined medaka-insertions (or Takifugu-deletions) as medaka extra sequences (MES) and Takifugu-insertions (or medaka deletions) as Takifugu extra sequences (TES). The other unaligned sequences were those between two sets of aligned sequences and defined as “unaligned sequence between aligned sequences (USBAS; Fig. 1b)”.
![]() View Details | Fig. 1. Alignment by BLASTZ produces aligned sequences (filled boxes) and two types of unaligned sequence (blank box). One is indels (a) clearly assigned within aligned sequences and named as medaka extra sequences (MESs) and Takifugu extra sequences (TESs). (b) The other is a corresponding unconserved sequence (USBAS), unaligned due to low homology between medaka and Takifugu, but comparable because of clipping between corresponding aligned sequences. |
Because the Takifugu sequences are still in a draft status, there were 30 sequence gaps in the studied region of Takifugu. Those gaps were found only in the USBASs of Takifugu and could not be evaluated for precise length. Thus, prior to analyses, we excluded pairs of USBASs with gaps in Takifugu from both species to allow genomic sequences to be compared between species as accurately as possible.
Recently, we determined the 19 Mb-DNA sequence of medaka chromosome LG22. We selected the particular region covered by five unique BAC clones because the draft sequence annotation predicted the gene number of this region is approximately same as that of the entire LG22 sequence (34 genes per 1 Mb, Sasaki et al., 2007). The selected region of 918.9 kb was processed for precise annotation using the gene prediction program GENSCAN and the BLAST search against public database. We identified 37 genes including 7 novel genes (Fig. 2). The presence of each gene was confirmed by identifying the corresponding sequences in the medaka EST database. The GC content of this chromosomal region was 40.7%, which is identical to the average GC content of the entire chromosome LG22 (40.9%). The total sequence of all the exons in those 37 genes was calculated to be 50.1 kb, in which 38.4 kb was derived from open reading frames (ORFs). In addition, the DNA length of this medaka chromosomal region was roughly 2 times larger than the corresponding region of Takifugu chromosome (see below), reflecting the 2-fold genome size difference between medaka and Takifugu.
![]() View Details | Fig. 2. Contig and gene map of medaka and Takifugu. Medaka and Takifugu maintain high synteny, except for the black bar in Takifugu scaffold 788 where synteny appears disrupted. Genes of open triangle indicate the genes found in both medaka and Takifugu. Filled triangles indicate genes found only in medaka. The gene names are, 1: Ppp1r3b, 2: Ptp4a, 3: Md0172F16_novel_1, 4: Gjb5, 5:Mlp, 6: NM_018045, 7: Olig2, 8: Md0170F19_novel_1, 9: Fndc5, 10: Md0170F19_novel_2, 11: Md0170F19_novel_3, 12: Arh, 13: Rhce, 14: Rhd, 15: Smp1, 16: Mgst3, 17: Md0147C05_novel_1, 18: Md0147C05_novel_2, 19: Gcpip, 20:Runx3, 21: Clic4, 22: Srrm1, 23: Tdh, 24: Mtmr9, 25: C8orf13L, 26: Lck, 27: Hdac1, 28: Bclp, 29: Md0200E16_novel_1, 30: Pabpc4, 31: Ppie, 32: Gjb4, 33: Gjb3, 34: Hmgcl, 35: Zbtb5, 36: Gale, 37: Insm1. |
Analysis using BLASTN against the whole genome shotgun database of Takifugu (JGI Takifugu rubripes ver. 3.0) identified five Takifugu scaffolds (scaffold940, 788, 1291, 3768, and 183) that present high homology to the selected 918.9 kb-medaka DNA sequence. The orientation of Takifugu scaffolds was determined by comparison with medaka genomic sequence (Fig. 3). The Takifugu Scaffold788 of 90 kb was an exception, because it showed high homology with a different region of medaka LG22. We assumed that an intra-chromosomal shuffling would have occurred in Takifugu (or medaka) lineage during evolution. Among 37 medaka genes, 33 genes were found in Takifugu and they were located in the same order and direction on the same chromosomal DNA. However, four genes were not found in the Takifugu database for unknown reason. We identified all the ORFs of 33 Takifugu genes and those sequences were counted up to 36.4 kb. These results showing a high degree of similarity suggest that these chromosomal regions of medaka and Takifugu would have been derived from the same region of a common ancestral chromosome.
![]() View Details | Fig. 3. High synteny between medaka and Takifugu, plotted by PipMaker. The arrow indicates the region in which synteny is disrupted between medaka and Takifugu seen in Fig. 2. |
The sizes of the corresponding regions of medaka and Takifugu chromosomes were calculated to be 789.6 kb and 387.4 kb, respectively (Table 1). The ratio of total sequence size was 2.04, which represents the genome size ratio between medaka and Takifugu. Therefore, the detailed comparison of these sequences would be worthy for providing information on the genome size diversity. In the concerned chromosomal regions, alignment by BLASTZ identified the total 178.2 kb DNA sequence in common and these sequences included 36 kb-sequence as ORFs.
![]() View Details | Table 1. Sequence categories and lengths for medaka and Takifugu |
To ascertain if the sequence aligned outside ORF was conserved in other species, we examined the corresponding region of zebrafish (Ensembl: zebrafish assembly ver. 4). In zebrafish, the corresponding region was divided into several small sub-regions that are assigned to at least three different chromosomes 13, 17, and 19. The size of one such sub-region was calculated to be 91.1 kb, and its relevant regions were calculated to be 28.1 kb for Takifugu and 56.6 kb for medaka. In these small sub-regions, six genes (Clic4, Srrm1, Tdh, Mtmr9, C8orf13L, and Lck) were found in common (see Fig. 2 for medaka and Takifugu genes), and the total sequence of ORFs was equally 7.6 kb for all these three fishes. Furthermore, the pair-wise comparison of those sequences by BLASTZ determined the total homologous sequence to be 16.2 kb between medaka and Takifugu, 9.4 kb between medaka and zebrafish, and 9.2 kb between Takifugu and zebrafish. The total size of sequences well-aligned outside ORFs was calculated to be 8.6 kb for medaka-Takifugu, 2.6 kb for medaka-zebrafish, and 2.7 kb for Takifugu-zebrafish alignment, indicating the 3-fold abundance of homologous sequences in medaka-Takifugu as compared to two other comparisons.
To clarify the contribution of repetitive elements to the 2-fold genome size difference between medaka and Takifugu, we analyzed the amount and composition of repetitive elements in the concerned regions. The medaka 789.6 kb-region contained the total 229.2 kb of repetitive elements (29.0%), whereas the Takifugu 387.4 kb-region contained the total 13.8 kb of repetitive elements (3.6%). This difference (215.4 kb) in the amount of repetitive elements accounts for 53.6% of the total sequence difference (402.2 kb) in the studied region. The types of repetitive elements were quite different between medaka and Takifugu (Table 2). About one-third of the Takifugu repetitive elements were assigned by RepeatMasker2 as “simple repeats” and “low complexity sequences” in consistent with previous whole genome analysis (Aparicio et al., 2002), whereas only 3.0% of the total medaka repetitive elements were assigned to those categories. So many as 648 types of repetitive elements were identified in medaka and those were located at 1,422 different sites in the concerned region, whereas only 29 types of repetitive elements were found at 51 different sites in Takifugu chromosome. There were repetitive elements common to medaka and Takifugu. These include DNA transposons, SINEs, and non-LTR retrotransposons such as Chaplin, SINE_FR, Maui, REX3, and Expander. Most significantly, medaka genome contains many copies of various “unclassified repeats” and most of them were uniquely found in medaka and not found in Takifugu. In the studied region of medaka, we identified 68 copies of one particular type of “unclassified repeats” in the total amount of 16.7 kb. As many as 566 types of unclassified repeats were found in the 134.0 kb-medaka DNA sequence with the distribution at 1,090 sites, and their amount corresponded to 58.5% of the total repetitive elements. Unlike medaka, only 7 types of unclassified repeats were found at 12 sites in the 1.8 kb-Takifugu DNA sequence. These results indicate that the abundance of low copy repeats is a characteristic feature of medaka chromosome.
![]() View Details | Table 2. Repetitive elements in medaka and Takifugu |
There are large amounts of sequences that are not common between medaka and Takifugu. These “unaligned sequences” were found to be 611.3 kb for medaka and 209.2 kb for Takifugu, respectively and classified into three types. One type is abundant in medaka and defined as medaka-extra sequence (MES), whereas another type is abundant in Takifugu and defined as Takifugu-extra sequence (TES). In the unaligned sequences, there were 2661 MESs and 2571 TESs but their average length was as small as 7.47 bp and 7.07 bp, respectively. Therefore, most of MES and TES are just small indels and do not belong to transposable elements. The total length of MES and TES were calculated to be only 19.9 kb and 18.2 kb, respectively. Therefore, we concluded that MESs and TESs were not major driving forces to determine the genome size of medaka and Takifugu. The remaining “unaligned sequences” were found in the regions between two aligned sequences, and these sequences were designated as USBAS “unaligned sequences between aligned sequences”. Here, the USBAS was considered responsible for the size difference of the studied region.
The total length of USBAS in medaka was calculated to be 591.5 kb, which is 3.10 times larger than Takifugu (191.0 kb) (Wilcoxon matched-pairs signed test, z = –11.92, p < 0.0001) and the length difference was 400.5 kb. The USBAS contains repetitive elements in the amounts of 228.1 kb for medaka and 12.3 kb for Takifugu. Thus, approximately half of the length difference of USBAS was attributed to the repetitive elements. Therefore, we excluded those repetitive elements and re-evaluated the rests of sequence as the USBAS with no repetitive elements (USBAS-nr). The maximum size of USBAS-nr was 6,655 bp for medaka and 3,304 bp for Takifugu, respectively. It should be noted that not all the USBAS-nrs in medaka are larger than Takifugu, namely 75 out of 296 USBAS-nrs in Takifugu were larger than those in medaka. The average length of USBAS-nr in medaka was still larger than Takifugu (Wilcoxon matched-pairs signed test, z = –10.43, p < 0.0001) and the total length was 363.3 kb for medaka and 178.8 kb for Takifugu with a ratio of 2.03. There were 36 USBAS-nrs between medaka and Takifugu and 38 USBAS-nrs between medaka and zebrafish in the Clic4-Lck regions. By further comparison, we found 16 USBAS-nrs whose positions are conserved among these three species medaka, Takifugu and zebrafish. The total length of such position-conserved USBAS-nrs of medaka was only 0.59 times of that of zebrafish and 2.35 times larger than that of Takifugu. Therefore, the position-conserved USBAS-nrs also reflect the genome size.
The length ratio for each pair of USBAS-nr between medaka and Takifugu is shown in Fig. 4. The length ratio was log-transformed with base 2 for simplicity, and therefore, a value larger than 1 means that the size difference is greater than two-fold. Under this condition, the mean was 0.863 with a standard deviation of 1.543, which is significantly larger than 0 (t = 9.627, n = 296, p < 0.0001, Fig. 4a). These results indicate that the shape of the distribution is different from normal distribution and significantly broader than normal (kurtosis: b2 = 6.36, p < 0.05).
![]() View Details | Fig. 4. The distribution of length ratios of medaka and Takifugu USBAS-nr. The length ratios were log-transformed with base 2. (a) The distribution of total length ratios showing that medaka USBAS-nr are actually twice as large as Takifugu USBAS-nr. The mean value was 0.863 with a standard deviation of 1.543. (b) The distribution of length ratios of USBAS-nrs within introns. (c) The distribution of length ratios of USBAS-nrs within intergenic regions. The mean value within introns was significantly larger than intergenic regions (t-test, t = 2.248, df = 294, p < 0.05). |
We also analyzed the location of USBAS-nr and its length ratio within intergenic regions. However, no significant correlation was found between the length ratio of the USBAS-nr within intergenic region and the distance from the USBAS-nr to the neighboring gene (medaka: r = 0.019, n = 182, p > 0.05, Takifugu: r = –0.049, n = 67, p > 0.05). Interestingly, the average log-transformed length ratio of USBAS-nr within intron (1.094, Fig. 4b) was significantly larger that within intergenic region (0.719, Fig. 4c) (t-test, t = 2.248, df = 294, p < 0.05). Therefore, we assumed that the length of each USBAS-nr was affected mostly at random throughout the whole genome region, and there was no bias for location of USBAS-nr within intergenic regions. However, between medaka and Takifugu, the distribution of length ratio of each USBAS-nr was broader than normal distribution, and there was a significant length ratio difference for the USBAS-nr within intron and intergenic regions. Although the introns had higher length ratio in the USBAS-nr, the proportion of the conserved sequence was higher in the intron (medaka 22.1%, Takifugu 43.0%) than the intergenic region (medaka 18.2%, Takifugu 39.0%).
The studied region of medaka chromosome LG22 was twice as large as the corresponding region of Takifugu, and this size ratio was identical to the genome size ratio between medaka and Takifugu. Moreover, the gene density and GC content of the concerned region were almost same as the entire chromosome LG22, therefore this 1 Mb region represented the whole genome and was suitable to use as an ideal case to analyze the genome size difference. In the concerned regions, 33 genes were located in the same order and same direction, therefore gene number was not related to the genome size difference. Furthermore, a small sub-region containing same 6 genes was common to three fishes (Takifugu, medaka, and zebrafish), but their size ratio was different as 1 : 2 : 4. Therefore, we concluded that the genome size difference among these three fishes may have been caused by gain or loss of small nucleotide sequences in the non-coding region, not by drastic gain or loss of large DNA fragments. Assuming zebrafish as an outgroup, we believe that the lineage of medaka and Takifugu has decreased genome size and such a tendency has been stronger in Takifugu.
In general, the conserved regions of chromosomes are suitable to make direct comparison at nucleotide sequence level. A quarter of the medaka DNA sequence was aligned to a half of the Takifugu DNA sequence in regard to both inside ORFs and outside ORFs. Most of the homologous sequences inside ORFs and some of the homologous sequences outside ORFs would have suffered from functional constraint during evolution. However, two thirds of the homologous sequences outside ORFs was lost in zebrafish, therefore, these lost homologous sequences would be not necessary to conserve among these three fishes. Some of these lost homologous sequences in zebrafish might be functional in only medaka and Takifugu, however, we believe that most of these lost homologous sequences would not be functional and may be related to the evolutionary divergence time, namely medaka and Takifugu diverged 184 Myr ago much more recent than the divergence between medaka and zebrafish (277 Myr ago) (Inoue et al., 2005; Yamanoue et al., 2006).
The genome sequence comparison between medaka and Takifugu revealed a large difference in the amount of repetitive elements, accounting for a half of the genome size difference. Then, we examined the involvement of their compositions in the genome size evolution as seen in other species (Boulesteix et al., 2006). The classification of repetitive elements in medaka is not comprehensive, but the composition clearly differs between medaka and Takifugu. In particular, 58.5% of the medaka repetitive elements are currently unclassified, and even the most frequent repetitive element accounts for only 2.1% (16.7 kb) of the studied region of medaka chromosome LG22. In the human genome, the most frequent repetitive element Alu occupies 10.6% of the total genome sequence (International Human Genome Sequencing Consortium, 2001). Moreover, most of the unclassified repeats found in medaka were not detected in Takifugu. In fact, medaka had many types of low copy unknown repetitive sequences. Also, Takifugu contained various repeats such as transposable elements far more than human (Aparicio et al., 2002). Taking all these data together, fish genome may be generally abundant in repetitive elements and hence further analysis of these “unclassified repeats” in medaka and related species will provide insights into the evolutionary significance of their relative abundance in particular species.
There was no significant difference in the total length of MESs and TESs between medaka and Takifugu, and most of the genome size difference was found within the USBAS. A half of the difference in the length of USBAS was attributed to the difference in the amounts of repetitive elements. The USBAS without repetitive elements (USBAS-nr) accounted for the remaining half of the length difference between medaka and Takifugu. We assumed that most of the sequences corresponding to USBAS-nr in both species must have been derived from the same region of a common ancestral chromosome, and they have changed by mutations independently in each lineage. The USBAS-nr may include ancient repetitive elements that were already subjected to many changes by various mutations over a long period of evolution, so that RepeatMasker that deposits the repeats of contemporary organisms cannot identify those repeats. The variation of log-transformed ratio of USBAS-nr suggested that the length of USBAS-nr in medaka would have changed in a way to make each medaka USBAS-nr twice as large as the corresponding Takifugu USBAS-nr. The observed variation of USBAS-nr ratio would fit with the idea of gradual compilation of small indels, although the length difference of USBAS-nr between medaka and Takifugu may include ancient repetitive elements. The small indels must have been accumulated in the sequence that cannot be aligned between medaka and Takifugu. As discussed above, the ancestral sequence of USBAS-nr would have been degraded after divergence from the common ancestor, thereby no or little homology was observed in the current USBAS-nr.
Furthermore, we found difference in the GC content between USBAS-nrs (medaka 37.2%, Takifugu 42.8%) and aligned sequences outside ORFs (medaka 44.4%, Takifugu 47.6%). This result suggests that mutations disturbed sequence homology outside ORF and this effect was much less in the GC-rich sequences (aligned sequences outside ORFs) than AT-rich sequences (USBAS-nrs). Therefore, the difference of evolutionary rate would be related to the heterogenic degeneration of homology in the non-functional sequences outside ORF, that may have resulted in higher homology in the GC-rich region and absence of homology in the AT-rich region. Because MES outside ORF was AT-rich (GC% = 42.4), it was suggested that deletions have been AT-biased in Takifugu, making Takifugu genome GC-rich. These may be the reasons why some sequences are diverged among species and others are not in the non-coding sequences.
The distribution of length ratio of each USBAS-nr between medaka and Takifugu was broader than normal distribution. There was no bias for generated location of USBAS-nr within intergenic regions, however, we identified the significant length ratio difference for USBAS-nr within intron and intergenic region. These results indicated that the effects of driving forces for alteration of USBAS-nr length should be different between intron and intergenic regions. We deduced that the difference of indel rate derived from effects of difference of driving forces would have resulted in current length ratio of USBAS-nr within intron and intergenic regions between medaka and Takifugu. Details of the driving forces for alteration of USBAS-nr length are difficult to deduce by this study alone. However, several other studies have shown the positive correlation of intron length and genome size (Moriyama et al., 1998; Vinogradov, 1999; McLysaght et al., 2000).
In summary, our study suggests that amplification of repetitive elements and gradual changes of indels mainly contributed to the genome size evolution. The “2-fold” concordance between medaka and Takifugu does not mean that gradual changes of indels occurring in the USBAS-nr are solely responsible for variation in the genome size evolution. The contribution of repetitive elements was estimated to be 54% and the contribution of non-coding sequences including MESs, TESs and the length difference in USBAS-nr was estimated to be 46% (Fig. 5). Most of the non-coding sequences must have gradually changed in two directions, gain or loss, by indels throughout the entire genome, thereby the genome size could have expanded or shrunk. Further analysis of repetitive elements and indels will be necessary to better understand the relationship between amplification of repetitive elements and compilation of indels over a long period of evolutionary time.
![]() View Details | Fig. 5. The compositions of medaka and Takifugu genome sequences in the studied region. The 2-fold length difference in USBAS-nr between the two species and variation in abundance of repetitive elements in medaka each account for approximately half of the total length difference between the two species. |
The authors thank S. K. Ishikawa for technical assistance with DNA sequencing. This work was supported by a Grant-in-Aid for Scientific Research on the Priority Area “Study of Medaka as a Model for Organization and Evolution of the Nuclear Genome” (#813), Priority Area “Comparative Genomics” (#015) from the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT).
|