The archaeal DNA replication machinery: past, present and future

Living organisms are divided into three domains: Bacteria, Archaea, and Eukarya. Whereas Bacteria and Archaea are both prokaryotes, proteins involved in information processes; replication, transcription, and translation, are more similar in Archaea and Eukarya. Here the history of the research on archaeal DNA replication is summarized and the future of the field is discussed.


INTRODUCTION
Archaea is one of the three domains of life. Archaeal organisms have been found in all habitats, including extreme environments such as strict anaerobic conditions, high temperature, and high salinity. The archaeal domain can be divided into five Phyla: Aigarchaeota, Crenarchaeota, Euryarchaeota, Korarchaeota, and Thaumarchaeota.
DNA replication plays an essential role in all life forms. The study of archaeal DNA replication initiated shortly after the recognition of Archaea as the third domain of life. However, at that time, studies focused mainly on the isolation and characterization of thermostable DNA polymerases. Major advances in the field initiated about 15 years ago when the first complete genome of an archaeon was determined (Bult et al., 1996). As it became apparent that the archaeal replication machinery is similar to that in Eukarya, a concentrated effort toward elucidating the archaeal replication machinery began in earnest. Early studies focused on the biochemical and structural characterization of the enzymes and factors needed for the initiation and elongation phases of DNA replication. Genomic studies identified homologs of bacterial and eukaryal replication factors.
In the past few years, genetic tools have enabled functional analysis of the replication proteins in vivo and the identification of new replication enzymes. Here a perspective on the field of archaeal DNA replication, past, present and future, is described. Due to space limitations only key papers and reviews are included.

THE BIOCHEMISTRY ERA
The study of archaeal DNA replication started in the 1980s (Fig. 1). Early on it became apparent that archaeal replication is different from that in Bacteria because aphidicolin, a specific inhibitor of eukaryotic DNA replication, also inhibits DNA synthesis and cell growth of haloarchaea and methanogenic archaea (Forterre et al., 1984;Zabel et al., 1985). These early studies suggested that DNA polymerases in Archaea may be similar to the eukaryotic replicative polymerases [DNA polymerase (Pol) α, δ, and ε]. After these initial studies, however, the focus shifted to the isolation and characterization of DNA polymerases from hyperthermophilic archaea for their commercial value in polymerase chain reaction (PCR). These early studies utilized polymerase activity as a tool to isolate the enzymes from cell extracts followed by cloning, expression and purification from Escherichia coli cells (Perler et al., 1996). These enzymes all belonged to family B DNA polymerases (PolB), as do the eukaryotic Pol α, δ, and ε. It was therefore suggested that PolB homologs are the replicative enzymes in Archaea. This is true in some archaeal species but not in others (discussed below). The studies of the replication machinery based on the knowledge of the biochemical properties of the enzymes was limited to DNA polymerase and a few other enzymes [e.g. topoisomerase (Forterre and Elie, 1993)] with well-defined activities that could be easily followed during purification.

THE GENOMICS ERA
The field of archaeal DNA replication experienced an expansion following the completion of the genomic sequence of Methanocaldococcus jannaschii (Bult et al., 1996). The analysis of this and subsequent genomes suggested that the archaeal replication proteins are similar to those found in Eukarya, but with lower complexity (Edgell and Doolittle, 1997) (Fig. 2). These in silico observations led to an influx of biochemical and structural studies on the replication machinery of archaea (Grabowski and Kelman, 2003). The genomic sequences, together with biochemical studies, also enabled the identification of differences in the replication apparatuses among the species and kingdoms. The current knowledge is summarized in a recent review article (Ishino and Ishino, 2012).
The genome sequences were also instrumental in identifying the archaeal origins of replication (oriC) (Fig. 2). The first report on the archaeal oriC came from a skew analysis of genome sequences (Lopez et al., 1999). The in silico analyses were later followed by in vivo studies that demonstrated that archaea contain a single origin on a circular genome, as in Bacteria. It was later shown that Sulfolobus has three origins, and multiple origins have also been reported in other species (Kelman and Kelman, 2004). Studies on the termination of DNA replication are also in progress. Most archaeal organisms have a single homolog of bacterial XerC/D proteins that is the site-specific recombinase involved in resolution of replicated chromosomes. The dif-like sequences are also predicted to be the replication termination region of Thermococcal DNA (Cortez et al., 2010), but outside of the fork fusion zone (ffz) of the multiple origins of the Sulfolobus DNA (Duggin et al., 2011). However, the mechanism of fork fusion and chromosomal resolution in archaea is not yet known.
One of the first proteins identified and studied based on the genome sequences was the ssDNA binding protein, replication protein A (RPA), from M. jannaschii (Kelly et al., 1998). Subsequent studies focused on the polymerase and its accessory factors, proliferating cell nuclear antigen (PCNA), replication factor C (RFC), and proteins involved in lagging strand processing such as DNA ligase and Fen1. All of these factors exhibited biochemical properties similar to their eukaryotic counterparts (Ishino and Ishino, 2012).
Some replication proteins were not readily identified by similarity to bacterial or eukaryotic counterparts. For example, archaeal DNA primase was not identified by strong sequence similarity. However, a candidate protein with limited similarity to the catalytic subunit of the eukaryotic primase was identified (Desogus et al., 1999). In some archaea, the gene encoding the putative catalytic subunit of primase is in an operon with another gene, and careful examination of the protein encoded by that gene revealed weak similarity to the non-catalytic subunit of the eukaryotic primase (Liu et al., 2001). Subsequent studies demonstrated that the complex of the two subunits constitutes the archaeal primase (Liu et al., 2001). Although similar to the eukaryotic enzyme, the archaeal primase exhibits unique features not found in other DNA primases. In vitro the enzyme is capable of incorporating both ribo-and deoxyribonucleotides. This observation underscores the idea that although archaeal DNA replication is similar to that of Eukarya, it is not merely a simpler version of the eukaryotic replication machinery.

First genome sequence
Although many of the DNA replication proteins were identified and characterized biochemically during the genomic era based on sequence similarity to proteins in Eukarya and Bacteria, biochemical approaches were still important in identifying new replication factors. For example, to identify new DNA polymerases in Pyrococcus furiosus, a genomic library was constructed and expressed in E. coli. The expressed proteins were screened for polymerase activity at high temperature, and a new dimeric DNA polymerase, designated PolD, was identified. PolD is archaeal-specific and therefore could not be identified using sequence similarity to other DNA polymerases   (Uemori et al., 1997). Following the characterization of the enzymes and factors involved in the elongation phase, the focus in the field shifted to the study of the proteins needed for the initiation of DNA replication. The first protein to be characterized was the minichromosome maintenance (MCM) helicase (Sakakibara et al., 2009), followed by studies on the initiation protein Cdc6, also referred to as Cdc6/Orc1 (Costa et al., 2013).
Also during this period, structures of most known archaeal replication proteins were determined either at high resolution using X-ray crystallography or at low resolution by electron microscopy (EM), small-angle neutron scattering (SANS), or small-angle X-ray scattering (SAXS). The structures of several protein complexes and protein-DNA complexes were also determined (Brewster and Chen, 2010;Costa et al., 2013;Ishino and Ishino, 2013). Two important structures that have not been solved are those of PolD and the full-length MCM protein.

THE GENETICS ERA
Until recently, genetic tools were not readily available in Archaea, but manipulation in several species is now possible (Leigh et al., 2011). These tools opened new avenues in archaeal research. For example, bioinformatic analysis identified two putative primases in Archaea, a two-subunit enzyme with homology to the eukaryotic primase (discussed above) and an enzyme with limited amino acid sequence similarity to the bacterial primase, the DnaG protein. Genetic deletion studies showed that only the eukaryotic-like primase is essential for cell viability (Le Breton et al., 2007). Taken together with the biochemical studies, this observation strongly suggests that the eukaryotic-like dimeric enzyme is the archaeal DNA primase. Another example is the cellular role of PolB and PolD. All archaeal species except of those belonging to the Crenarchaeal phylum contain both PolB and PolD. The biochemical properties of both enzymes are consistent with the activities expected of a replicative polymerase including stimulation by PCNA and 3′-5′ exonuclease proofreading activity (Ishino and Ishino, 2012). Thus, it was not clear if both are required for chromosomal replication. Based on in vitro biochemical characterizations including primer usage and strand displacement activity, it has been suggested that PolB may replicate the leading strand while PolD replicates the lagging strand. This would make archaeal organisms similar to eukaryotic organisms, in which Polε replicates the leading strand while Polδ replicates the lagging strand. Using gene knockouts, however, it was found that, at least in some species, PolB is not required for cell viability (Cubonova et al., 2013;Sarmiento et al., 2013). These results suggest that only PolD is the replicative enzyme in those species.
Genetic tools were also used to expand the pool of proteins known to participate in the archaeal replication process. While many replication factors have been identified using in silico approaches, it is clear that some factors and enzymes involved in archaeal replication have not yet been identified. In vivo tagging of known replication proteins and the isolation of complexes that copurified with them resulted in the identification of several putative new replication proteins (Li et al., 2010). For example, this strategy identified an archaeal protein that interacts with the GINS complex (Li et al., 2011). The protein, called GAN (GINS-associated nuclease) shares amino acid sequence and biochemical properties with the bacterial RecJ nuclease. Sequence analysis revealed that GAN, RecJ, and the eukaryotic Cdc45 protein all contain a DHH phosphodiesterase domain, suggesting they are homologs (Sanchez-Pulido and Ponting, 2011;Makarova et al., 2012). It had been long hypothesized that archaea have a functional Cdc45 homolog, and genetic tools enabled its isolation and characterization.

THE FUTURE
Many of the proteins needed for archaeal DNA replication have been identified and characterized in the last 15 years. However, there are a few goals in the field that have not been achieved. One is the reconstitution of an in vitro DNA replication system for Archaea. An in vitro DNA replication system has not been established in Archaea despite the efforts of several labs. When developed, such a system will provide a new tool for detailed mechanistic studies on the replication machinery. It will enable the elucidation of the roles played by the different proteins and complexes needed for the initiation and elongation phases, the regulation of the process, the coupling of leading and lagging strand synthesis, etc. It is noteworthy that a partial reconstitution of a rolling circle replication using synthetic DNA and proteins from T. kodakarensis was published (Chemnitz Galal et al., 2012). It will be helpful to investigate the elongation process of the archaeal replication in more detail. Furthermore, progress will be made in the structural determination of high-order complexes including proteins and DNA to provide a better understanding of the initiation and elongation phases of DNA replication.
Although progress has been made, we are still far from completely understanding the initiation and termination of DNA replication in archaea and these are two of the main subjects for future research. It will also be important to study the initiation process in species with single and multiple origins and the coordination between initiations at different origins. Recently it was shown that Haloferax volcanii can survive without a canonical origin (Hawkins et al., 2013). Further work is needed to show how initiation occurs in those cells, and to determine if Perspective on archaeal DNA replication this is halophile-specific or a more widespread phenomenon.
Further accumulated information on DNA replication in different archaeal species will be used to understand the evolutionary relationships among different archaeal phyla and between Archaea and the other domains of life.