The majority of eukaryotic genomes contain a large fraction of repetitive sequences that primarily originate from transpositional bursts of transposable elements (TEs). Repbase serves as a database for eukaryotic repetitive sequences and has now become the largest collection of eukaryotic TEs. During the development of Repbase, many new superfamilies/lineages of TEs, which include Helitron, Polinton, Ginger and SINEU, were reported. The unique composition of protein domains and DNA motifs in TEs sometimes indicates novel mechanisms of transposition, replication, anti-suppression or proliferation. In this review, our current understanding regarding the diversity of eukaryotic TEs in sequence, protein domain composition and structural hallmarks is introduced and summarized, based on the classification system implemented in Repbase. Autonomous eukaryotic TEs can be divided into two groups: Class I TEs, also called retrotransposons, and Class II TEs, or DNA transposons. Long terminal repeat (LTR) retrotransposons, including endogenous retroviruses, non-LTR retrotransposons, tyrosine recombinase retrotransposons and Penelope-like elements, are well accepted groups of autonomous retrotransposons. They share reverse transcriptase for replication but are distinct in the catalytic components responsible for integration into the host genome. Similarly, at least three transposition machineries have been reported in eukaryotic DNA transposons: DDD/E transposase, tyrosine recombinase and HUH endonuclease combined with helicase. Among these, TEs with DDD/E transposase are dominant and are classified into 21 superfamilies in Repbase. Non-autonomous TEs are either simple derivatives generated by internal deletion, or are composed of several units that originated independently.
Riboviruses are viruses that have RNA genomes and replicate only via RNA intermediates. Although they do not require a DNA phase for replication and do not encode reverse transcriptase, the presence of DNA forms of riboviral sequences in ribovirus-infected cells has been reported since the 1970s. Additionally, heritable ribovirus-derived sequences, called riboviral endogenous viral elements (EVEs), have been found in the genomes of many eukaryotes. These are now thought to be formed by the reverse transcription machineries of retrotransposons within eukaryotic genomes sometimes referred to as selfish elements. Surprisingly, some reverse-transcribed riboviral DNAs (including EVEs) provide physiological functions for their hosts, suggesting the occurrence of novel interactions among eukaryotic genomes, retrotransposons and riboviruses, and opening the door to new avenues of investigation. Here I review current knowledge on these triangular interactions, and discuss future directions in this field.
In the current era, as a growing number of genome sequence assemblies have been reported in animals, an in-depth analysis of transposable elements (TEs) is one of the most fundamental and essential studies for evolutionary genomics. Although TEs have, in general, been regarded as non-functional junk/selfish DNA, parasitic elements or harmful mutagens, studies have revealed that TEs have had a substantial and sometimes beneficial impact on host genomes in several ways. First, TEs are themselves diverse and thus provide lineage-specific characteristics to the genomes. Second, because TEs constitute a substantial fraction of animal genomes, they are a major contributing factor to evolutionary changes in genome size and composition. Third, host organisms have co-opted many repetitive sequences as genes, cis-regulatory elements and chromatin domain boundaries, which alter gene regulatory networks and in addition are partly involved in morphological evolution, as has been well documented in mammals. Here, I review the impact of TEs on various aspects of the genome, such as genome size and diversity in animals, as well as the evolution of gene networks and genome architecture in mammals. Given that a number of TE families probably remain to be discovered in many non-model organisms, unknown TEs may have contributed to gene networks in a much wider variety of animals than considered previously.
The two-dimensional site frequency spectrum (2D SFS) was investigated to describe the intra-allelic variability (IAV) maintained within a derived allele (D) group that has undergone an incomplete selective sweep against an ancestral allele group. We observed that recombination certainly muddles the ancestral relationships of allelic lineages between the two allele groups; however, the 2D SFS reveals intriguing signatures of recombination as well as the genealogical structure of the D group, particularly the size of a mutation and the time to the most recent common ancestor (TMRCA). Coalescent simulations were performed to achieve powerful and robust 2D SFS-based statistics with special reference to accurate evaluation of IAV, significance of recombination effects, and distinction between hard and soft selective sweeps. These studies were extended to a case wherein an incomplete selective sweep is no longer in progress and ceased in the recent past. The 2D SFS-based method was applied to 100 intronic linkage disequilibrium regions randomly chosen from the East Asian population of modern humans to examine the P value distributions of the summary statistics under the null hypothesis of neutrality in a nonequilibrium demographic model. We argue that about 96% of intronic variants are non-adaptive with a 10% false discovery rate. Furthermore, this method was applied to six genomic regions in Eurasian populations that were claimed to have experienced recent selective sweeps. We found that two of these genomic regions did not have significant signals of selective sweeps, but the remaining four had undergone hard and soft sweeps and were dated, in terms of TMRCA, after the major out-of-Africa dispersal of modern humans.
Centromere protein B (CENP-B), a protein participating in centromere formation, binds to centromere satellite DNA by recognizing a 17-bp motif called the CENP-B box. This motif is found in hominids (humans and great apes) at an identical location in repeat units of their centromere satellite DNA. We have recently reported that the CENP-B box exists at diverse locations in three New World monkey species (marmoset, squirrel monkey and tamarin). However, the evolutionary origin of the CENP-B box in these species was not determined. It could have been present in a common ancestor, or emerged multiple times in different lineages. Here we present results of a phylogenetic analysis of centromere satellite DNA that support the multiple emergence hypothesis. Repeat units almost invariably formed monophyletic groups in each species and the CENP-B box location was unique for each species. The CENP-B box is not essential for the immediate survival of its host organism. On the other hand, it is known to be required for de novo centromere assembly. Our results suggest that the CENP-B box confers a long-term selective advantage. For example, it may play a pivotal role when a centromere is accidentally lost or impaired.