2022 Volume 19 Article ID: e190011
Design principles of phenotypes in organisms are fundamental issues in physical biology. So far, understanding “systems” of living organisms have been chiefly promoted by understanding the underlying biomolecules such as genes and proteins, and their intra- and inter-relationships and regulations. After a long period of sophistication, biophysics and molecular biology have established a general framework for understanding ‘molecular systems’ in organisms without regard to species, so that the findings of fly studies can be applied to mouse studies. However, little attention has been paid to exploring “phenotypic systems” in organisms, and thus its general framework remains poorly understood. Here I review concepts, methods, and case studies using butterfly and moth wing patterns to explore phenotypes as systems. First, I present a unifying framework for phenotypic traits as systems, termed multi-component systems. Second, I describe how to define components of phenotypic systems, and also show how to quantify interactions among phenotypic parts. Subsequently, I introduce the concept of the macro-evolutionary process, which illustrates how to generate complex traits. In this point, I also introduce mathematical methods, “phylogenetic comparative methods”, which provide stochastic processes along molecular phylogeny as bifurcated paths to quantify trait evolution. Finally, I would like to propose two key concepts, macro-evolutionary pathways and genotype-phenotype loop (GP loop), which must be needed for the next directions. I hope these efforts on phenotypic biology will become one major target in biophysics and create the next generations of textbooks. This review article is an extended version of the Japanese article, Biological Physics in Phenotypic Systems of Living Organisms, published in SEIBUTSU-BUTSURI Vol. 61, p. 31–35 (2021).
Design principles of phenotypes in organisms are fundamental issues in physical biology. In contrast to much progress in understanding the “molecular systems” of living organisms, “phenotypic systems” remains poorly understood. Here I review concepts, methods, and case studies using butterfly and moth wing patterns to explore phenotypes as systems. This review provides a unifying framework termed multi-component systems, including how to find phenotypic components (parts and interactions). In addition, evolutionary dynamics of multi-component systems are provided. Finally, I propose the next key concepts, macro-evolutionary pathways and genotype-phenotype loops. I hope these efforts will create the next generations of textbooks.
Are there common principles or general rules for organisms’ phenotypes (e.g., morphology, behaviors, life history)? Can we establish systems biology for phenotypic traits? It would be said that the 20th century is the era when we have accumulated a large amount of knowledge for biomolecules (genes, RNA, proteins) and their intra- and inter-molecular mechanisms. Recent progress in measurement technologies and physics approaches has added quantitative biological approaches and deepened our understanding of systems inside the bodies of organisms [1–5]. However, the phenotypic traits are considered just as end products produced by such molecular mechanisms, and few attempts are to find systems outside the body of organisms [6–9]. Unfortunately, the discipline of biological physics still lacks the conceptual framework for understanding phenotypic traits as systems.
A large body of knowledge for phenotypic traits is scattered in each field of macrobiology, such as zoology, taxonomy, ecology, and evolutionary biology, and thus not centralized into a common framework in descriptions [e.g., 10–13]. Such a situation prevents the construction of general theory in phenotypic biology. For example, it is too tough to answer the questions associated with common principles, such as “Are components used repeatedly in evolution easy to generate a modular structure?” and “What kind of constraints are there in an evolutionary order where constituting components are combined in complex evolutions?” If we establish a unifying framework for the description of phenotypic traits, it will allow us to compare traits between different taxa (e.g., insects and vertebrates), appreciate structural principles in phenotypic traits, and explore general rules in the evolution of diversification and complexity of traits.
Here I review concepts, methods, and case studies using butterfly and moth wing patterns to explore phenotypes as systems. First, I will introduce a unifying framework for understanding phenotypic traits as systems, termed multi-component systems. Next, I will show how to define components of phenotypic systems, and then how to quantify methods for the interactions among parts. Furthermore, I will introduce the macro-evolutionary process, which describes an assembly order of components in phenotypic evolution. Following this introduction, I will briefly explain a mathematical background, phylogenetic comparison methods (PCMs), in which trait evolutions are calculated as stochastic processes on a phylogenetic tree. Finally, I would like to propose two key concepts, macro-evolutionary pathways and genotype-phenotype loop (GP loop), which must be needed for the next directions. The present manuscript mainly introduces morphological traits, but the framework proposed here can be applied in common to various phenotypic traits such as behavior and life history.
I here would like to propose a conceptual framework for understanding organismal phenotypes as systems, consisting of an assembly of several parts with their interactions. Such a framework is familiar in molecular biology (Fig. 1a). In this framework, biomolecules such as genes and proteins are described abstractly as circles and squares. In addition, interactions among biomolecules (e.g., gene regulatory networks and protein-protein interactions) are described as arrows and t-shaped lines [14,15]. However, we face difficulties when trying to apply similar schemes to phenotypes (Fig. 1b). For example, what are components for phenotypes? How to quantify the interactions among the constituting parts? These are fundamental issues for establishing phenotypic systems; however, these remain entirely unclear.
Schematic illustration of molecular and phenotypic systems. (a, b) Systems are composed of several parts and the interactions of them, observed in molecules (a) and phenotypes (b). Parts are abstractly shown by circles, triangles, squares and stars, and interactions are shown by arrows and t-shaped lines.
To establish phenotypic biology equivalent to molecular biology, I proposed a unifying concept, termed multi-component systems (Fig. 2a), which are defined as traits that satisfy two properties, “reducibility” and “composability” (Fig. 2b, c) [16]. Reducibility describes the idea that traits are partly decomposable into an assembly of components [sensu 17]. Composability describes the idea that complex traits that can perform advantageous functions can be formed from various combinatorial arrangements of components. These two concepts describe a design principle of phenotypic traits (note: it does not explain the mechanisms by which traits are generated). The concept of reducibility allows us to identify components of traits and list their characteristics. Reducibility is considered within the time scale of a single generation, because components are generated through genetic and developmental mechanisms. It is important to note that the composability of traits says that components can be flexibly combined to evolve complex traits, but it does not state that any combinations of components are permitted during evolution. I hereby propose a unifying conceptual framework to describe “phenotypic systems.”
The concept of multi-component systems. (a) Multi-component systems consist of parts and interactions. The parts are categorized into kinds and states. (b, c) Multi-component systems satisfy two properties, reducibility (b) and composability (c). (d) Conceptual hierarchy of phenotypic systems. Parts are abstractly shown by circles, triangles, squares and stars, and interactions are shown by arrows and t-shaped lines. Several panels of this figure is reproduced with modification from [16].
In this framework, a conceptual hierarchy for describing phenotypes is summarized, as shown in Figure 2d. Phenotypic traits are biological entities, and phenotypic systems are their abstract representations. Theoretically thinking, phenotypic traits could be categorized into several different types of phenotypic systems (i.e., not necessarily into a single system). One of the phenotypic systems is a multi-component system consisting of a set of components. What a phenotypic trait is depends on the focus of your studies. For example, there is a case where the entire limb of a mouse is a phenotypic trait, with the humerus and phalanges as its components; in other cases, the phalange itself is a phenotypic trait, and each of the finger bones is its components.
In the scheme of multi-component systems, components include not only parts but also interactions (Fig. 2a) [16]. In general, interactions are generated through the relationships between elements in hierarchically lower levels of parts. For example, in genetic molecular systems, parts include genes and cells, whereas interactions include genetic regulations generated via several bases (i.e., lower elements) in genes, and cellular interactions generated via molecules (i.e., lower elements) expressed in cells. Similarly, in morphological traits in phenotypic systems, parts include beetle horns and butterfly wing patterns, whereas interactions include growth interactions among horns generated via physiological resource competitions [i.e., lower elements, 18], and spatial occupation interactions among pattern components generated via molecular developmental regulations [i.e., lower elements, 19]. In summary, components of multi-component systems indicate all elements constituting a system you are interested in.
Furthermore, in the scheme of multi-component systems, parts are discriminated into two types, kinds, and states [16]. This view has been discussed in the fields of systematic/phylogenetic/taxonomic biology [20,21]. For example, in genes, kinds are presence (or absence) of genes (e.g., wingless), whereas the states are mutational differences and splicing variants among homologous genes (e.g., wingless in Drosophila melanogaster and Bombyx mori). Similarly, in morphological traits in phenotypic systems, kinds are presence (or absence) of morphological components (e.g., a beetle horn of Orthophagus sharpi), whereas the states are the numbers, shapes, sizes, and colors among homologous components (e.g., beetle horns of a genus, Onthophagus group) [18]. In summary, components consist of kinds, states, and interactions (Fig. 2a).
How can we define components constituting phenotypic traits? In genetic molecular systems, parts (e.g., genes) could be identified using open reading frames and sequences of transcription products. Unfortunately, unlike genetic systems, there are no common principles and technologies to identify phenotypic components. Under such a situation, how should we find phenotypic components?
What kind of frameworks are there for defining components? Needless to say, it is suitable that such a framework can be applied to a wide range of organisms rather than to specific taxonomic groups of organisms. For this purpose, I used knowledge of comparative biology, where components are defined with evolutionary thinking and categorized into three types, homology, homoplasy, and novelty (Fig. 3) [22–26]. Homology is defined as traits continuously inherited from the common ancestor [26–29]. Thus, homologous traits can be detected as a phylogenetic cluster (Fig. 3b). Homoplasy is defined as similar traits across species, but not continuously inherited from the common ancestor [30,31]. Thus, homoplasious traits can be detected as a phylogenetic scatter (Fig. 3c). Novelty is defined as newly evolved traits rather than inherited from ancestors [32–34]. Thus, novel traits can be detected as a phylogenetic unique (Fig. 3d). In summary, phenotypic components are determined according to phylogenetic origins.
Phylogenetic views for the classification of phenotypic components. (a) An example of phylogeny and species. (b) Homology. Homologous components are shown by gray circles. (c) Homoplasy. Homoplasious components are shown by white triangles. (d) Novelty. Three novel components are shown by a white star, a white pentagon and a black square. (e) Multi-component systems of eight species are summarized as an ensemble of all components.
The comparative biological view of components provides us with whether they are common or unique components among species (Fig. 3e). The common components are found as homology, widely used in molecular biology to identify genes and proteins shared across different organisms such as flies and mice [35]. This characteristic, for example, allows us to transfer the molecular mechanisms (e.g., WNT signaling) elucidated by flies to research in mice [36]. The same situation could also be found in phenotypic traits [e.g., 37]. For example, the phenotypic mechanisms (e.g., limb skeletal systems) elucidated by mouse could be transferred into research in elephants (e.g., sixth toes [38]). Like this, homology can be seen not only in genes and proteins but also in phenotypes. In contrast, the unique components are found as novelty, which is observed in a specific taxonomic group. Typical examples of novelty are beetle horns and vertebrae in vertebrates. It notes that all homologous traits were novel in the first step of evolution. In other words, if novel traits are continuously inherited through speciation, these traits could be classified as homologous traits. Moreover, the repeatedly-used components are found as homoplasy, repeatedly gained and lost in different lineages. A typical evolutionary mode of homoplasy is convergence, such as eyes in humans and squids. Homoplasious traits are often discussed in relation to environmental adaptations or evolutionary constraints within design space [39]. In this manner, the characteristics of components are thought to be reflected in the patterns of phylogenetic relationships of components.
In general, homologous components tend to be inherited from their ancestors in a set of several components rather than a sole component [22,24]. This tendency is known to be particularly remarkable in morphological traits. This definite set of homologous components is termed “ground plan.” It is just an empirical rule, and the mechanisms of how a set of components are inherited are unknown from the knowledge of genetic developmental, or physical, biology. A typical example of the ground plan is the skeletal systems of mammalian limbs (e,g., human hand for grabbing, bat wings for flying, and whale fins for swimming) [37]. These are different in appearance and function, but their structures are composed of the “same” (=homologous) components (humerus, ulna, radius, and phalanges). The ground plan is also found in wing patterns of moths and butterflies (Fig. 4a) [40,41]. Furthermore, ground plans have been found in various traits of various animals and plants [26].
Multi-component systems in butterfly wing patterns. (a) Ground plan. (b) Venous patterns. (c) Filled-venous patterns. (d) Color fields. (e) Ripple patterns. (f) Leaf masquerade (Kallima inachus). The wing pattern shows a leaf-venation pattern (left), composed of several types of components, homology (red), homoplasy (blue) and novelty (green). (g–j) The ground plan of camouflage for leaves (K. inachus, g; Memphis philumena, h), lichen (Hamadryas feronia, i) and rock (Oeneis melissa, j). The states at the rest are shown in the inset (h, j). (k, l) The components of Müllerian mimicry between two species, Danaus plexippus (k) and Limenitis archippus (l). The lines (g–l) are colored as the same to those in the panel (a). This figure is reproduced with modification from [42,43].
Interestingly, the components in the ground plan are known to preserve the relative positional relationships between components evolutionarily. This relative positional relationship is called topographical relationships, and this rule is called “principe des connexions” [44]. This rule can identify homologous components of traits across species, which was finally incorporated into Remane’s criteria as the first criterion [45]. The Remane’s criteria are the rules for identifying homologous components across different species, which consist of three principal rules: (1) similarity of topographical relationships, (2) similarity of special features, and (3) transformational continuity through intermediate ontogeny or phylogeny. Thus, under the condition where a ground plan is established, homologous components can be identified based on the relative positional relationships (i.e., the ground plan).
When finding the ground plan, it is first necessary to identify the fixed point(s) because relative positions of parts could be there in several ways, which is explicitly described by Rieppel [22]. For example, in the ground plan of lepidopteran wing patterns, the fixed point is a discal spot placed at the end of the discoidal cell, an area surrounded by wing veins (Fig. 4a). Since the discoidal cells are highly conserved across almost families of Lepidoptera, it is confirmed to identify the pattern components placed at the end of the discoidal cell. In this manner, the fixed points are determined using pre-existing components (e.g., the discoidal cells in this case).
To examine multi-component systems of traits in animals, I focused on camouflage and mimicry patterns in butterfly wings [41–43]. We first investigated a representative leaf pattern in butterflies (Kallima inachus) and found that the wing patterns are composed of various types of components, homology, homoplasy, and novelty (Fig. 4f) [42,43]. For example, in the leaf vein pattern, the main vein is composed of homologous components (i.e., ground plan); the left-lateral vein is composed of a combination of components of homology (ground plan) and homoplasy (venous pattern); the right-lateral vein is composed of a combination of components of homology (ground plan) and novelty. In summary, the results revealed that the leaf pattern of the Kallima butterfly is a mosaic structure originated by combining various components with distinct evolutionary origins. This discovery was surprisingly welcomed in the research field of camouflage and mimicry, which has already been introduced in the representative textbook [46].
I here explain in detail how these three types of components (i.e., homology, homoplasy, novelty) were identified in the dead leaf pattern [42,43]. First, we identified homologous components based on the ground plan (Fig. 4a, g). As explained above, the fixed point (i.e., discoidal spot; DS) was identified. Then, since the relative positions among homologous components are conserved within the scheme of the ground plan, other components are identified in order starting from the DS. Next, we identified homoplasious components. Since previous studies suggested that some pattern components are repeatedly used in butterfly wing patterns [40,43,47], these components are thought to be candidates for homoplasious components, including venous patterns (Fig. 4b), filled-venous patterns (Fig. 4c), color fields (Fig. 4d), ripple patterns (Fig. 4e). Although there is no standard method for identifying homoplasious components, we proposed a methodology based on Rieppel’s fixed points and Remane’s second criterion [43]. In this method, venous patterns are identified using Rieppel’s fixed point, because these components of patterns are placed on the wing veins. We first identified wing veins as the fixed points, then observed the pigmentation on the wing veins, and finally identified the venous patterns. Ripple patterns and color fields are identified using Remane’s second criterion. Since ripple patterns are defined as rhythmical patterns in which each element of ripples runs perpendicular to the orientation of the wing scales and wing veins, we used these characteristics as special features to find the components of ripple patterns. Since color fields are defined as color patterns that are distinctly different from the heretofore described banding and spotting patterns in that they involve large areas of the wing surface that contrast in coloration with other areas, we used these characteristics as special features to find the components of color fields. Finally, we identified novel components. Novel components were here considered as components other than those of homology and homoplasy, though precise descriptions of novel components require dense sampling within specific taxa [24]. In this manner, the components of butterfly wing patterns are identified by a series of procedures in comparative biology.
We then compared two leaf patterns in butterflies (Kallima inachus and Memphis philumena), and found that the leaf patterns are made of a highly flexible combination of components (Fig. 4g, h) [43]. For example, in the leaf patterns of these species, the main veins are composed of different components (Cd and BOp in K. inachus; in contrast, Cd and DS in M. philumena). This flexibility of building logic in leaf patterns could be discussed in the tinkering concept proposed by François Jacob [48]. This concept was described as “a tinkerer who does not know exactly what he is going to produce, but uses whatever he finds around him, whether it be pieces of string, fragments of wood, or old cardboards; in short it works like a tinkerer who uses everything at his disposal to produce some kind of workable object.” Thus, the flexible building logic of leaf patterns in butterfly wings might reflect the tinkering logic of the evolutionary processes behind them [41].
Subsequently, we investigated various types of camouflage (dead leaves, bark-covered lichens, and rocks) (Fig. 4g–j) [43]. Interestingly, different types of camouflage are mainly made by the “same” components (i.e., ground plan) but with changes in the state of components (e.g., straight lines/curves). For example, in leaf patterns, one component (Cd) of the ground plan represents a straight line to mimic the leaf-venation (Fig. 4g, h); in lichen patterns, the Cd component represents a zig-zagged line to mimic toggle patterns of lichen (Fig. 4i); in rock patterns, the Cd component shows a curved line to mimic crack patterns of rock (Fig. 4j). In summary, these results revealed combinatorial building logic of multi-component structures of camouflage in butterfly wing patterns.
Furthermore, we compared mimicry patterns of two species (Fig. 4k, l) [43]. Mimicry is a survival strategy that enhances the ability of toxic species to escape predators by imitating patterns with each other. The wing patterns of Danaus plexippus and Limenitis archippus mimic each other (inset of Fig. 4k, l). Interestingly, the orange pattern was made differently for both species. In D. plexippus, the orange patterns are generated only by the color fields, whereas in L. archippus these are generated by a combination of color field and the ground plan. These results reveal that the mimicry patterns have evolved using different components to imitate each other.
How can we grasp inter-relationship among the parts? Phenotypic traits must be organized, because phenotypes are adapted to environmental conditions and generated through the corresponding genetic and developmental mechanisms [49–51]. Genetic mechanisms include genetic variations such as mutations and linkage disequilibrium, whereas developmental mechanisms include gene regulatory networks and cellular dynamics. In general, the time scale is pivotal to understanding ecological and evolutional dynamics [52]. Since genetic variations (e.g., mutational differences among individuals) have potential effects on interactions among parts, phenotypic interactions should be understood at the population level. In this framework, positive interactions, shown in arrows in figure 2a, include not only activation of gene regulations in ontogenetic levels but also an increase of gene expressions by genetic mutations in population levels. This organization is crystalized into the concept of morphological integration, which postulates that functionally related elements are tightly coupled [49–51]. A special form of morphological integration is modularity, in which units are tightly coupled but can be individually decoupled [53,54]. In summary, under the concept of morphological integration, morphological shapes are understood as structures in which parts are coupled and decoupled through variations at the population level.
To quantify the interactions among components, I developed a mathematical method, termed morphological correlation network, using phenotypic variations of morphological shapes in populations (Fig. 5a, b) [4,55]. Morphological correlation networks are composed of nodes as landmarks set on morphological shapes, and links as interactions between such landmarks (Fig. 5a, b). The landmarks are the representative points placed on morphological shapes [56]. For example, in wing patterns of butterflies and moths, the landmarks are intersectional points of wing veins and patterns (Fig. 5c). It notes that which points on morphological shapes are representative are, at least for now, due to biological knowledge, though quantitative methods (e.g., deep learning techniques [57,58]) would be more suitable to identify landmarks without prior pieces of knowledge.
Phenotypic interactions among parts. (a, b) Landmarks set on the ground plan and the morphological correlation network in Thyas juno (a) and Oraesia excavata (b). (c) The landmarks are placed at the intersectional points of phenotypic components and pre-existing structures (here, wing veins shown by gray lines). (d) Spin-glass model. (e, f) Strategies for lepidopteran wing pattern diversification. (e) Individualization. (f) Rewiring. The landmarks, shown by circles, are colored as the same to those in Fig. 4a. Modules detected in morphological correlation networks are colored as light blue. This figure is reproduced with modification from [55].
The interactions between parts (i.e., landmarks) are quantified using correlations due to phenotypic variations. Although there are several ways to quantify correlations, I here would like to introduce a geometric morphometric approach for quantifying correlated variations of spatial positions of landmarks. First of all, the spatial positions (i.e., x- and y-axis) of each of the landmarks are digitized using several individuals, and then standardized the spatial positions using geometric morphometrics [56]. Subsequently, standardized positions are calculated into correlated variations [4,55]. The correlated variations are calculated as follows:
(1) |
where Xi ∈ {i=1,2,…} denotes a random vector that consists of two rows (x and y coordinates in the landmark i) and N columns (N, the number of samples). Briefly, Rv coefficient is the correlation between two landmarks i and j, each consisting of x and y variables in two dimensions [59,60]. Rv coefficients are summarized as Rv matrix (Rvij), which denotes a symmetrical matrix of the Rv coefficient between the two landmarks i and j. The correlation matrix Rvij was converted to a weighted adjacent matrix Wij, where wij=rij if an Rv coefficient satisfies a given threshold; if not, then wij=0. Thus, correlations between parts can be represented as undirected graphs with weights (i.e., the strengths of correlations).
To find modules (i.e., tightly-coupled nodes) from networks, a spin-glass model in statistical physics can be used. Spin-glass is a multi-body system consisting of multiple spins and their interactions with each other, and thus could be understood as graphs (Fig. 5d) [61]. In statistical physics, this model is used to solve the global optimization of a given function derived from the spin-to-spin interaction systems, which results in a good approximation in an ample search space by reaching the minimal state of the spins. To find modules from morphological correlation networks, I used a Reichardt-Bornholdt method, in which the modules of the network are extracted by seeking the spin configuration that minimizes the energy of the spin-glass with the spin states being the module indices [62]. This method partitions the nodes into modules that minimize a quality function (“energy”):
(2) |
where Wij denotes the weighted adjacency matrix of the network calculated above. pij denotes the edge probability between node i and j according to the null model. The null model should reflect the connection probability between nodes in a network having no apparent module (i.e., community structure) [62]. In this manner, multi-component systems can be implemented through spin-glass models, where parts are shown by spins and the interactions among parts are shown by spin-spin links (Fig. 5a, b, d).
I used the method of morphological correlation network and analyzed two types of moth wing patterns (Fig. 5a, b) [4,55]. I first investigated the morphological integration of a simple pattern of the noctuid moth, Thyas juno (Fig. 5a). This moth displays a cryptic pattern to tree bark. This wing pattern was found to be governed by the ground plan. The mathematical analyses detected four modules corresponding to each component, strongly suggesting that the coupling states reflect the ancestral states of wing patterns. I then investigated the morphological integration of the complex pattern of the noctuid moth, Oraesia excavata (Fig. 5b). This moth displays a masquerade pattern to dead leaves. This wing pattern was found to be governed by the ground plan. The mathematical analyses detected four modules, but not corresponding to each component, rather corresponding to functional units such as the main vein, left vein, and right vein of the leaf vein pattern. This result strongly suggested that the coupling states of this leaf pattern have evolved through rewiring the ancestral states of coupling and decoupling patterns. These results suggest that complex and simple patterns evolve differently in morphological integration in wing patterns (Fig. 5e, f). Simple patterns maintain the modularity of each pattern by evolution, and are thought to reflect the state of their ancestors (individualization strategy; Fig. 5a, e). In contrast, complex patterns were reorganized modularity for each component, generating new modular structures through coupling and uncoupling (rewiring strategy; Fig. 5b, f). Since the dead leaf pattern fluctuates due to genetic mutation and developmental noise, it was speculated that it has acquired modules corresponding to functional units to avoid destruction of the leaf pattern for precise patterning.
The macro-evolution process is “an evolutionary path that shows the temporal order of gains, losses, and modifications of components toward the establishment of complex adaptive traits” (Fig. 6a) [16,41]. To elucidate the macro-evolutionary process, previous studies used fossil records showing ancestral states and/or intermediate states. For example, Friedman discovered the intermediate states of flatfishes (eyes at the top of the head) and revealed the stepwise evolutionary process toward a current state of flatfishes (one-sided eyes) [16,63]. It is a powerful way to explore the evolutionary process of complex traits. However, this approach has severe limitations. One major limitation is that fossil records we can obtain are strongly biased. For example, we cannot find the fossil records of these examples, butterfly wing patterns and animal eyes, because such soft tissues are too difficult to remain. How can we overwhelm this difficulty?
Macro-evolutionary process. (a) An illustrative example of macro-evolutionary process. (b–d) Statistical models for analysis of evolution in discrete traits. (b) Mk model for transitions between two kinds (or states) of a part or an interaction. (c) Mk model for transitions between four states of two parts or interactions. (d) BiSSE model for transitions between two kinds (or states) of a part or an interactions, occurred with speciation (λ0 and λ1) and extinction (μ0 and μ1) of species. (e) Schematic illustration of analysis using phylogenetic comparative methods. Macro-evolutionary process can be reconstructed by tracing ancestral states of components at the nodes (A to D) of phylogeny. Parts are abstractly shown by circles, triangles, squares and stars, and interactions are shown by arrows and t-shaped lines. This figure is reproduced with modification from [16].
The solution for resolving macro-evolutionary processes of any traits comes from a phylogenetic point of view. The classical method for solving the evolutionary order of traits is to use “character polarity,” which shows the biased placement of the traits (or components) on a phylogenetic tree (e.g., component 2 (triangles) in Fig. 6e) [64,65]. If you find a character polarity of traits you are interested in, you could readily infer the macro-evolutionary process of such traits [e.g., 66,67]. Actually, component 2 was gained at node C. Thus, the macro-evolutionary process was reconstructed where component 1 (squares) was first gained, and component 2 was then gained. However, in many cases of evolution, there is no character polarity, and traits are more complicatedly scattered on a phylogenetic tree (e.g., components 3 and 4 (circles and stars) in Fig. 6e). To resolve such complicated evolutions, we require quantitative and statistical approaches for estimating ancestral states of traits, even though there is no clear character polarity of traits.
Currently, analytical methods termed phylogenetic comparative methods (PCMs) are sweeping research fields of evolution and ecology [68–70]. After proposed by Felsenstein in 1985 [71], it now provides a wide range of applications such as ancestral state estimation, trait correlation evolution, evolutionary model selection, and detection of phylogenetic inertia [72–75]. The fundamental ideas of mathematics are to model the state change of traits by a stochastic process, where a phylogenetic tree is considered a bifurcated path representing evolutionary paths, and run a stochastic process on the tree. PCMs can analyze discrete traits (0, 1, 2, …) or continuous traits. For discrete traits, Markov chains are widely used (Fig. 6b), and for continuous traits, Brownian motion is used. As a result, it is to estimate the parameters of the model under the framework of the maximum likelihood method or Bayesian method.
Here, I introduce a mathematical foundation of ancestral state reconstructions using PCMs [see review for details, 76]. For ancestral state reconstructions, the most fundamental model is the Mk model, which assumes that trait states can change along phylogenetic branches [77,78]. The Mk model is based on the continuous-time Markov process, where the probability Pij(t+Δt) that the trait changes from state i to state j at time t with an infinitesimal interval Δt is given by
(3) |
where qij is a transition rate from state i to j. For all possible combinations of i and j, equation 1 can be summarized into a matrix, providing the Chapman-Kolmogorov equation:
(4) |
Where
and I is the identity matrix.
To solve for P(t) in terms of the rate parameters Q, we transform equation 2 into a differential equation
(5) |
from which we derive the general solution
(6) |
where c is a constant. Thus, the Chapman-Kolmogorov equation provides a closed form solution for P(t) in terms of the rate parameters in Q and the time, or phylogenetic branch length, t.
In addition, the stochastic models using Markov chains have been extended to a transition model between three or more states of one trait [78,79] and a transition model between four states that implements the correlation of two traits (Fig. 6c) [77,80]. The four-states transition model can analyze whether a pair of components are independent or dependent. Furthermore, the Markov chain model has been extended to a model that can estimate speciation and extinction rates in combination with the birth-death model, termed as SSE (State, Speciation and Extinction) models (Fig. 6d) [e.g., 81]. In addition, it is also implemented in the birth-death process and the Ornstein-Uhlenbeck process [for review, 82].
To investigate the evolution of traits, PCMs have been used to estimate ancestral states of traits [72,82]. Many previous studies could not reconstruct the evolutionary process of how to build up complex traits, because these studies treated the overall phenotype as a structurally integrated unit. For example, the PCM analysis estimates the presence/absence of leaf mimicry at the ancestral nodes of the phylogeny [e.g., 83]. Therefore, we have got understood when and how many times this leaf mimicry evolved. This approach is termed “whole-integrated PCMs” [16]. This whole-integrated approach has thus focused on revealing when complex traits originated. However, this approach cannot reconstruct the sequential evolutionary process that builds up complex traits from simple traits.
One simple solution comes from the framework of multi-component structures. In this framework, traits could be decomposed into an assembly of components. In such analyses, PCMs can estimate ancestral states of each component, and then sum ancestral states of all components to reveal macro-evolutionary processes toward complex traits. This approach is termed as “component-based PCMs” (Fig. 6e) [16].
The analytical framework for the reconstruction of the macro-evolutionary process using PCMs consists of five steps: (step 1) reconstruction of a molecular phylogenetic tree by standard methods using maximum likelihood (e.g., RAxML) [84] and Bayesian inference (e.g., MrBayes) [85], (2) decomposition of traits into an assembly of components based on comparative biological approach, (3) character coding of each component coded as binary, or quantitative, states, (4) estimation of ancestral states of each component using PCMs, (5) reconstruction of macro-evolutionary process by component-based analyses (Fig. 6e) [16]. For any analyses, input data required are gene data for molecular phylogeny and phenotypic trait data for character coding. Thus, these analyses could be readily introduced into various analyses of broad traits (e.g., not only morphology but also genes and proteins).
This analytical framework provides a solution for any situation, even if the phylogenetic polarity of traits is unclear, revealing the states of traits (=ancestral state) at nodes of a phylogenetic tree (Fig. 6e). For example, at node A, component 1 (squares) was present, component 2 (triangles) was absent, component 3 showed the state of open circles, etc. By putting these together, the ancestral state (a square and an open circle) can be estimated. Similarly, the ancestral states at nodes B, C, and D can be estimated. By tracing the ancestral states from nodes A to B to C to D, we can reconstruct the macro-evolutionary process by which a trait evolved step by step. This method can be applied to various organisms. In addition, it is also applicable to any types of phenotypic traits, including behaviors and life histories other than morphologies, and more genetic traits, including genomes, genes, proteins, protein-protein interactions, etc. This method will provide a general framework for quantitative analyses.
Using the mathematical method, we elucidated how leaf patterns in butterfly wings (Kallima spp.) have evolved from a simple pattern that does not look like a leaf [42]. Since the butterfly pattern follows the ground plan, both the dead leaf patterns and the non-dead leaf patterns can be decomposed into the components of the ground plan (Fig. 4a, g). Thus, the differences between the dead leaf patterns and the non-dead leaf patterns are just the differences in the states of the “same” components. For example, in the dead leaf patterns, the component that makes up the main vein of a leaf pattern represents straight lines (red line of the hind wings in Fig. 4g), but in other species, they are curved lines. The states of each component were coded as binary; for example, the red line of the hind wing is a straight line (=1) and not a straight line (=0). As a result, the dead leaf pattern was decomposed into 11 homologous components, and the states were coded as 0 or 1. The coding process was performed for other butterflies with about 40 closely related species.
We then used PCMs, estimated the ancestral patterns at nodes of the phylogenetic tree, and reconstructed the evolutionary process from non-leaf patterns to leaf patterns by tracing the states of all components at nodes A to B to C to D (Fig. 7). It was found that the dead leaf patterns have evolved as follows. First, the red line on the hind wing becomes a straight line, and the yellow line degenerates. Next, the red line at the base of the hind wing is fragmented, the blue line has degenerated, and the red line at the front wing becomes a polygonal line. Finally, the red and yellow lines of the front wing come into contact, the green line becomes a straight line, and the eye-spot pattern of the front and rear wings degenerates.
Macro-evolutionary process from non-mimetic patterns to mimetic leaf patterns. PCMs analysis reconstructed evolutionary process from an early standard pattern (at the node A) to two intermediate patterns (at the node B and C) to the final leaf pattern (at the node D). Each component is changed from the state 0 to 1 at the node A to D. This figure is partially reproduced with modification from [42].
This review presents concepts, methods, and case studies using butterfly and moth wing patterns to understand phenotypes as systems. For one next direction, I would like to apply these concepts and methods to other traits of various organisms. I would like to explore the common rules and design principles of phenotypic systems in another direction. For other directions, I would like to develop a conceptual foundation for phenotypic systems biology. What kind of concepts will be needed? In the final section, I propose two key concepts, macro-evolutionary pathways and genotype-phenotype loop (GP loop).
The complexity and diversification of traits are generated through evolutions with modifications from the ancestors (note: I here use the term of complexity just for indicating the numbers of components [sensu 86,87]). The evolution in complexity is thought to be a series of evolutionary paths [e.g., 88]. For example, I introduced a case study of the evolutionary emergence of dead leaf patterns in butterfly wings, uncovered by analyzing evolutionary paths from a simple pattern to a more complex one (Fig. 7). In this manner, the “macro-evolutionary process” exhibits evolutionary paths, where traits have gained and lost their components (Fig. 6a) [16,41]. However, such an evolutionary process is not enough to represent the evolution of traits, because traits evolve in relation not only to complexity, but also to diversification. If traits of organisms are considered multi-component systems, such traits have diversified through evolutionary changes in combinations of their components. To illustrate such a diversified evolution, I propose a key concept, macro-evolutionary pathways, an ensemble of evolutionary paths that bifurcates and produces various traits (Fig. 8a).
Macro-evolutionary pathways. (a) Macro-evolutionary pathways. (b–d) Elementary paths of evolutionary pathways. (b) Weighted paths. The evolutionary path from A to B is twice as weighted as that from B to C. (c) Loop structures. This structure is generated by an ensemble of two evolutionary paths, a path from A to B to C and a path from B to C to A. (d) Bi-directional paths. The evolutionary path between A and B is generated by an ensemble of two paths, a path from A to B and a path from B to A. The components of traits are shown by A, B, C.
Although these two concepts are closely related, the concept of macro-evolutionary pathways is not simply placed on the extension line of that of macro-evolutionary processes. One of the key differences is that macro-evolutionary pathways are weighted ensemble paths, not just bifurcated paths that trace the nodes of a phylogenetic tree. For example, evolutionary paths that appear multiple times in macro-evolutionary processes (i.e., convergent evolutionary paths) are weighted as many times as they appear in macro-evolutionary pathways (Fig. 8b). In addition, an ensemble of several paths could generate loop structures (Fig. 8c) and bi-directional structures (Fig. 8d). These types of topological structures are unique to the reconstruction of macro-evolutionary pathways, which could not be found in reconstructing macro-evolutionary processes. Therefore, the reconstruction of macro-evolutionary pathways can reveal the network structure of trait evolution away from the bifurcated structure of the phylogenetic tree.
From a molecular biological point of view, phenotypes are end products through genetic and developmental processes. It seems true if we think of phenotypes within an ontogenetic timescale. However, in evolution, we need to consider phenotypes within a longer-timescale (e.g., several million years across the origins of new species). The relationships between phenotypes and genotypes are generated through rounded selection processes (i.e., natural selection or genetic drift). As a result, it has gained a specific structure such as modular structures, the architecture of which is conceptualized Genotype-Phenotype map (GP map) (Fig. 9a) [53]. GP map has emphasized how evolvable developmental architectures have emerged. Unlike the GP map perspective, I would like to focus on the boundary conditions exhibited by the G-P rounded process in long-term evolution. Namely, the phenotypes are the boundary conditions when the genotypes are shaped in the next round, where phenotypic traits are determined first, then genetic and developmental mechanisms are shaped. Notably, as a result of the longer-tern evolutions, many species have acquired the specific type of phenotypic structures, body plan or ground plan [89–91], including the ground plan of butterfly and moth wing patterns (Fig. 4a) [40,41]. Thus, it is necessary to consider both the processes, a phenotype is generated from a genotype, and the phenotype channels a genotype. To illustrate this phenomenon, I here propose a crucial concept, “genotype-phenotype loop (GP loop)” (Fig. 9b).
Genotype-Phenotype loop (GP loop). (a) Genotype-Phenotype map (GP map). (b) Genotype-Phenotype loop (GP loop). Phenotypic components are shown by circles, triangles and squares. Genotypic components are shown by diamonds. A newly fixed GP relationship are shown by red arrows. The causal relationships between genotype and phenotype are shown by arrows.
As a fact that supports the above idea, it is well known that homologous phenotypes are often formed by different developmental mechanisms [92]. In addition, recent studies clearly showed that homologous phenotypes are generated through different genetic mechanisms [93], and developmental mechanisms termed developmental system drift (DSD) [94,95]. For example, developmental process forming segmentation (e.g., a process including gap, pair-rule, segment polarity in Drosophila) are quite different among Arthropod species [96]. For another example, one component (ESs) in the ground plan of butterfly wing patterns is generated by different molecular mechanisms [97]. DSD focuses on the flexibility of developmental mechanisms concerning genotype-phenotype relationships. Such flexible properties might allow the evolutionary reorganization of developmental mechanisms. However, it is entirely unclear under what conditions the weights of the G to P and P to G boundary conditions change. If phenotypes predate genotypes, we could detect newly acquired relationships from phenotype to genotype (Fig. 9b). I hope that understanding the GP loop will significantly contribute to understanding the complex system formed by the loop relationship between parts and the whole [98].
The authors have no conflicts of interest to declare.
TKS designed the study and wrote the paper.
This work was supported by Grant-in-Aid for Scientific Research (C), Japan (MEXT/JSPS KAKENHI Grant Number 18K06371).