Evolution of Folded Domains from Short, Oligomerizing Peptides through Coacervation

Liam M. LONGO

doi:10.2142/biophys.64.181

Abstract

In 1966, Margaret Dayhoff reasoned that duplication and fusion is a fundamental mechanism for generating protein complexity. Her insight inspired generations of scientists, several of whom would demonstrate this trajectory with short peptides that symmetrically assemble into a contemporary protein architecture. But how did these oligomerizing peptides, able to adopt complex conformations, emerge in the first place? In the present review, the evolution of an ancient and ubiquitous nucleic acid-binding element is traced from a simple, heterochiral peptide that coacervates with RNA to a folded-domain that binds with high affinity to the minor groove of double-stranded DNA.

Translated Abstract

短いペプチドを対称的に組み合わせ，現代のタンパク質構造を再現することは可能だが，ペプチドが多量体化し複雑な構造を獲得する過程は不明だ．本稿では，古代の普遍的な核酸結合要素が，RNAと共にコアセルベート化する単純なヘテロキラルペプチドからdsDNAの副溝に高い親和性で結合する折り畳まれたドメインに進化した可能性を探求する．

1. Dayhoff’s Insight

In 1966, Margaret Dayhoff reported internal repetitions within the sequence of ferredoxin¹⁾ and this observation would serve as inspiration to protein scientists for decades to come. Foremost, Dayhoff’s insight provided the basis for understanding how complex, contemporary proteins could have emerged from simple beginnings: By building up complexity through the duplication and fusion of peptide motifs. Indeed, the preponderance of repetitive protein architectures, and particularly those with an axis of pseudo-rotational symmetry, may suggest a mechanistic privilege with respect to emergence, one that could ultimately be understood within the framework of Dayhoff’s thinking.

The evolution of repetitive protein folds involves three key structural states, illustrated in Figure 1A: (i) a short peptide that oligomerizes into a globular domain; (ii) a single domain with a high degree of internal sequence repetition, the product of a duplication and fusion event; and (iii) a contemporary domain that retains repetition in structure but has lost much of its sequence repetition due to the joint action of selection and drift. Over the past 25 years, this evolutionary trajectory has been realized, in part or in full, by several groups studying different protein architectures (Figure 1B). The first full demonstration of this trajectory was described by the Blaber Group, in which the sequence of a natural β-trefoil protein was systematically symmetrized and then fragmented to achieve a 42-residue peptide that trimerized to recapitulate the β-trefoil fold²⁾. Parallel studies of symmetrization and fragmentation were underway on the TIM barrel, exemplified by refs^3),4). Considering just rotationally symmetric architectures, trajectories were subsequently reported for the β-propeller^5),6), ferredoxin⁷⁾, the (HhH)₂-Fold^8),9) and the double-ψ β-barrel¹⁰⁾ – significant and varied support for the model presented in Figure 1A. Moreover, folding studies performed on a symmetrized β-trefoil demonstrated beneficial properties of repetitive sequences with respect to a common form of structural perturbation, circular permutation¹¹⁾. Because the circular permutation operation on a repetitive sequence will regenerate the sequence of the interior subdomains, perturbation to protein folding is minimized.

Fig. 1

A. Evolution of a rotationally symmetric protein architecture from a homo-oligomerizing peptide, a model that emerged following Dayhoff’s insight¹⁾. B. Timeline of studies that recapitulate the evolutionary trajectory, either in part or in full, described in Panel A. The (HhH)₂-Fold (red line) is the primary topic of discussion for the remainder of the review. For oligomers, each chain is rendered in a different color; for symmetric monomers, rainbow coloring is used. Due to space constraints, this timeline is not exhaustive and first demonstrations for a given protein architecture are given priority. For cases where crystal structures were not reported, AlphaFold2 models¹⁶⁾ are shown. References for each study are provided in the main text. Protein structure figures were prepared in PyMOL (pymol.org).

The emergence of symmetric protein architectures from short, oligomerizing peptides with well-defined structures, however, does little to explain where the oligomerizing peptide itself came from. And, these initial emergence events may be more idiosyncratic than the trajectory that follows: Ferredoxin, for example, may have emerged from a simple peptide wrapped around a 4Fe-4S cluster¹²⁾; the β-trefoil may have emerged by “budding off” of a more ancient β-protein, such as the IgG-like β-sandwich¹³⁾ or the β-propeller¹⁴⁾. For other folds, the mechanisms of emergence are even less well understood, raising the questions: To what extent is functional continuity possible when even more primitive states are considered? How can we bridge the divide between flexible peptides – of the type that may have been produced on the early Earth – and folded domains?

2. The (HhH)₂-Fold as a Model of Fold Emergence

The helix-hairpin-helix (HhH) motif is an ancient and ubiquitous nucleic acid-binding element that, upon duplication and fusion, can form an (HhH)₂-Fold¹⁵⁾, illustrated in Figure 1B as a dimer. The (HhH)₂-Fold has specific affinity for dsDNA, which it binds via interactions to the minor groove in a sequence-nonspecific fashion. To study the earliest evolutionary stages of this fold, a systematic deconstruction effort, akin to Corey’s retrosynthetic analysis approach from synthetic organic chemistry, was carried out⁸⁾. By retracing processes in a stepwise fashion, a plausible evolutionary starting point (or starting material, in the case of retrosynthesis) can be identified. This approach has the benefit of creating a trajectory through sequence-structure space that links the putative intermediates through common evolutionary processes (Figure 2A).

Fig. 2

Retracing the history of the (HhH)₂-Fold and HhH motif. A. Experimental deconstruction of the (HhH)₂-Fold starting from natural (HhH)₂-Fold domains, mostly reported in ref⁸⁾. Deconstruction yielded an (HhH)₂-Fold protein with a reduced amino acid alphabet and 100% sequence identity between the two structural subdomains. Cleavage of this domain in half resulted in a single HhH construct, Precursor-HhH, that phase separated with RNA. B. Robustness of the Primordial-(HhH)₂ and Precursor-HhH constructs to various aspects of perturbation (chiral inversion or scrambling), taken from refs^8),18). These data highlight the interplay between specific and non-specific interactions that give rise to the observed biochemical properties, as described in the main text. C. The observation that coacervate formation and RNA binding promote dimerization of an HhH motif into the (HhH)₂-Fold⁹⁾ points to the coacervate context as a potential ‘cradle’ for domain evolution.

The first step undertaken was to reconstruct the ancestor of all duplicated (HhH)₂-Fold domains using ancestor state reconstruction approaches. The resulting sequence, which was folded and functional, had a ~2-fold increase in the sequence identity of the two HhH subdomains. This increase in sequence symmetry was not by design, supporting the evolutionary model in Figure 1A. In the next step, the sequences of the first and second structural subdomains (that is, single HhH motifs) were made identical, a process that, by chance, reduced the amino acid alphabet size to just 13 characters. The resulting construct, Symmetric-(HhH)₂, retained binding affinity for dsDNA.

Next, and following earlier research that pioneered the concept of prebiotic protein design¹⁷⁾, the (HhH)₂-Fold was simplified to an alphabet of just 10 “primordial” amino acid types, with either ornithine or arginine (which can be considered the product of ornithine guanidination) serving as the sole basic amino acid type. Both the ornithine and the arginine variants of Primordial-(HhH)₂ bound to dsDNA, as did constructs in which only a fraction of the ornithine residues was statistically modified into arginine residues. Generally speaking, affinity for dsDNA and folding were improved with increasing arginine content.

Note, however, that the successful simplification of several protein folds to date – also performed in ref.¹⁰⁾ – need not represent historical reality (i.e., that there was a time in protein evolution when only a restricted alphabet of amino acids was available to construct proteins). Rather, it may be an indication that the folds explored by evolution are amenable to profound simplification. That is, natural folds are robust to perturbation and easy to discover and, thus, relatively diverse alphabets can encode these structures. It is therefore unlikely that the absence of any particular amino acid precluded the discovery of stable protein conformations, even very early in life’s history. The real conundrum is the processes that happened upon sequences with favorable properties (e.g., folding into a compact conformation), the class of structures that these peptides adopted, and the subsequent, adventitious recruitment of these nascent folds into some biological process. In this regard, folding that can protect a peptide from hydrolysis, and thus promote persistence, is an essential property because persistence translates to opportunity for recruitment into a biological process.

Intriguingly, total chiral inversion of Primordial-(HhH)₂ (i.e., by using D-amino acids instead of L-amino acids) did not abolish binding (Figure 2B¹⁸⁾), despite the fact that the binding mode of the (HhH)₂-Fold to the minor groove of dsDNA is chiral. An analysis of dissociation kinetics suggested that the “functional ambidexterity” of the (HhH)₂-Fold likely stems from the existence of multiple binding conformations. While the highest affinity binding mode is sensitive to the chirality of the substrate, several additional kinetic phases (presumably distinct binding modes) are shared between the natural (L-protein, D-nucleic acid) and cross-chiral (L-protein, L-nucleic acid) pairs, and are thus effectively ambidextrous. It is tantalizing to hypothesize that the less specific binding modes predate the more specific binding mode in the evolutionary history of this domain.

Finally, Primordial-(HhH)₂ was fragmented at the boundary between structural subdomains to yield a construct with just a single HhH motif, Precursor-HhH (Figure 2A). Binding to dsDNA was disrupted and the peptide adopted a random-coil conformation in solution. Upon mixing with RNA, coacervation was observed. Liquid-liquid phase separation (LLPS) has attracted considerable interest in the past few years, emerging as a common physical phenomenon with diverse biological roles and unique physicochemical properties. Membrane-less organelles formed via LLPS have featured prominently in origin of life scenarios – with proposals dating as far back as Oparin himself!¹⁹⁾ – and are thought to have been comprised of disordered peptides and RNA. Indeed, nucleic acids and proteins likely enjoyed a long history of collaboration²⁰⁾. A core observation of this work⁸⁾, then, is that modern nucleic acid-binding domains may be descendants from primordial peptides that coacervated with RNA. Put a different way, coacervate-forming peptides were a potential resource for the discovery of folded domains that bind to nucleic acids.

Is the coacervation potential of Precursor-HhH defined by amino acid sequence or composition? Scrambling the sequence, either completely or retaining the position of basic amino acids, abolished coacervation with RNA, instead yielding insoluble aggregates. This result suggests that sequence (and not just composition) is essential for defining the coacervation potential of the HhH peptide. Chiral inversion of the Precursor-HhH, however, did not abolish coacervation, suggesting that the interactions formed between the peptide and RNA are relatively nonspecific. Likewise, inverting the chirality of every other amino acid, thereby blocking the ability to fold into the canonical (HhH)₂-Fold conformation, also did not abolish coacervation.

While folding into a (HhH)₂-Fold may not be strictly required for coacervate formation, how does the coacervate context impact the conformation of Precursor-HhH? To address this question, electron paramagnetic resonance (EPR) spectroscopy was used. At high ratios of RNA to peptide, the spin concentration of the peptide within the coacervate became sufficiently dilute for double electron-electron resonance (DEER) spectroscopy, revealing the presence of dimers with a distance distribution matching that expected for the (HhH)₂-Fold⁹⁾. The coacervate context promoted the formation of HhH dimers (Figure 2C) bringing this fold into concordance with other symmetric folds (Figure 1B) where an oligomerizing intermediate was observed. This work highlights how coacervates may have been important sites for protein structure evolution by promoting the exploration of oligomeric states.

3. From so Simple a Beginning

Synthesizing the results from Figure 2^8),9),18), an updated evolutionary trajectory of the HhH-motif, from a flexible peptide to a folded domain, is proposed (Figure 3; see caption for details). All of the states within this trajectory have been experimentally validated and this trajectory maintains an aspect of functional continuity in the transition from a coacervate-forming peptide (simple function) into a domain that binds with specificity to dsDNA (more complex function). It is important to remember, however, that the sequences used within this trajectory (and all others like it, Figure 1B) almost certainly never existed within nature, even if reconstruction approaches were used. The value (but also the limitation) of research of this type is to identify plausible trajectories through protein sequence-structure space that can inform our understanding of how proteins evolve, revealing strong constraints as well as the potential for surprising continuities and functional transitions. Nevertheless, these studies support the view that complex structures can emerge from “so simple a beginning,” to echo Darwin.

Fig. 3

Evolutionary trajectory bridging a simple, heterochiral peptide that phase separates with RNA to a folded domain that binds with high specificity to the minor groove of dsDNA. This trajectory is inferred from refs^8),9),18). Although heterochiral peptides can adopt folded conformations, encoding well-defined conformations without chiral control would be difficult due to changing patterns of chirality (even in the event that the exact same sequence is retained). Upon the emergence of chiral control, the possibility of folding – in this case, upon dimerization – significantly increases. These initial steps could have served to tune the properties of the RNA-peptide coacervate. Once folding upon oligomerization occurs, these states can be recruited and optimized by a duplication and fusion process, as in Figure 1. In the case shown here, binding to the new ligand, the minor groove of dsDNA, was not yet optimized and instead proceeded by a lower-affinity binding mode with higher promiscuity (note that the primitive dsDNA binding mode presented here is illustrative only). Eventually a higher-affinity, well-defined binding mode emerges. This new binding mode need not entirely displace earlier binding modes, which may confer benefits to, for example, the association rate.

The future of the field of early protein evolution must be to move beyond demonstrating possible trajectories of early protein evolution. We must begin to assess the space of alternative trajectories and rank their relative likelihood. Only in doing so can the impact of contingency in early protein evolution – and surprisingness or predictability of our own protein universe – be unveiled.

References

Biographies

Liam M. LONGO

Specially Appointed Associate Professor / Associate PI, Earth-Life Science Institute, Tokyo Institute of Technology

責任著者(Corresponding author)

J-STAGEへの登録はこちら（無料）