Genes & Genetic Systems
Online ISSN : 1880-5779
Print ISSN : 1341-7568
ISSN-L : 1341-7568
Full papers
Genealogical characterization of regional populations and dorsal coat color variation in the house mouse Mus musculus from Asia based on haplotype structure analysis of a gene-rich region harboring Mc1r
Kazuhiro ZakohKazumichi FujiwaraToyoyuki TakadaNaoki OsadaHitoshi Suzuki
著者情報
ジャーナル オープンアクセス HTML
電子付録

2023 年 98 巻 2 号 p. 73-87

詳細
ABSTRACT

We analyzed 196 haplotype sequences from a gene-rich region (250 kb) that includes Mc1r, a gene involved in coat color regulation, to gain insight into the evolution of coat color variation in subspecies of the house mouse Mus musculus. Phylogenetic networks revealed haplotype groups from the major subspecies of M. m. castaneus (CAS), M. m. domesticus (DOM), and M. m. musculus (MUS). Using haplotype sequences assigned to each of CAS and MUS through phylogenetic analysis, we proposed migration routes associated with prehistoric humans from west to east across Eurasia. Comparing nucleotide diversity among subspecies-specific haplotypes in different geographic areas showed a marked reduction during migration, particularly in MUS-derived haplotypes from Korea and Japan, suggesting intensive population bottlenecks during migration. We found that a C>T polymorphism at site 302 (c.302C>T) in the Mc1r coding region correlated with a darkening of dorsal fur color in both CAS and MUS. However, C/C homozygous mice in MUS showed marked variation in lightness, indicating the possibility of another genetic determinant that affects the lightness of dorsal fur color. Detailed sequence comparisons of haplotypes revealed that short fragments assigned to DOM were embedded in CAS-assigned fragments, indicating ancient introgression between subspecies. The estimated age of c.302C>T also supports the hypothesis that genetic interaction between subspecies occurred in ancient times. This suggests that the genome of M. musculus evolved through gene flow between subspecies over an extended period before the movement of the species in conjunction with prehistoric humans.

INTRODUCTION

The house mouse (Mus musculus) shows considerable morphological variation among geographic populations, particularly in terms of coat color (e.g., Schwarz and Schwarz, 1943; Marshall, 1998; Lai et al., 2008; Kishimoto et al., 2021; Takeishi et al., 2021). It was once divided into >10 subspecies based on morphological characteristics, including variation in dorsal and ventral coat color (Schwarz and Schwarz, 1943). The species is currently integrated into three or four subspecies based on phylogenetic analysis using various molecular phylogenetic markers (e.g., Moriwaki et al., 1986; She et al., 1990; Boursot et al., 1993; Yonekawa et al., 1994; Awasthi et al., 1998; Suzuki et al., 2013; Jing et al., 2014; Hamid et al., 2017). The M. musculus species includes three subspecies: M. m. domesticus (DOM), which is mainly distributed in western Europe and northern Africa; M. m. musculus (MUS), found mainly in northern Eurasia, but not in western Europe; and M. m. castaneus (CAS), which is mainly distributed in South and Southeast Asia, as well as southern China. CAS is believed to have a large homeland and considerable genetic variation. In fact, mice from Afghanistan and Nepal are sometimes classified as a distinct subspecies with the taxonomic name M. m. bactrianus (Boursot et al., 1993; Adhikari et al., 2018). Additionally, the analysis of mtDNA phylogeny indicates the existence of five distinct lineages: the three subspecies, a lineage representing the southern tip of Yemen (M. m. gentilulus; Prager et al., 1998), and a lineage representing Nepal (Terashima et al., 2006; Suzuki et al., 2013).

Genome-wide analyses have focused on the evolutionary history of the house mouse (Frazer et al., 2007; Takada et al., 2013, 2021; Harr et al., 2016). Whole genome sequences have been determined for 98 individuals of wild origin, mainly of the two Asian subspecies (MUS and CAS; Fujiwara et al., 2022a). Although the Yemeni population has not yet been examined, whole-genome sequencing has revealed that the genetic diversity of this species is described by the three major subspecies (CAS, DOM and MUS), with an estimated divergence time of 200,000 years ago (Fujiwara et al., 2022a). Whole mtDNA sequences showed that the five distinct mtDNA lineages diverged nearly simultaneously 400,000 to 600,000 years ago (Li et al., 2021; Fujiwara et al., 2022b), suggesting the presence of the mice in these regions since those times. Therefore, genetic differentiation began in the homelands approximately 500,000 years ago, and genetic exchange subsequently occurred among geographic populations. Mus musculus dispersed to a broad area of Eurasia, associated with the prehistoric movement of humans practicing agriculture (Moriwaki et al., 1986; Bonhomme et al., 2011; Suzuki, 2013; Li et al., 2021). A phylogenetic analysis of 98 whole mtDNA sequences revealed the details of dispersal from their native range to the Japanese archipelago, through gradual regional expansion events during the development of agriculture in prehistory (i.e., during the Holocene) (Li et al., 2021).

The variation in ventral coat coloration among regions of the present distribution area presumably occurred during the stepwise eastward migration (Takeishi et al., 2021). CAS has white ventral fur in its homelands of Iran, Pakistan and northern India; it has dark gray fur in newly established areas, namely the southern Indian subcontinent, Southeast Asia and southern China (Takeishi et al., 2021). The Russian MUS population tends to have a black abdomen, whereas populations from western to northern China, the Korean Peninsula and the Japanese archipelago tend to have a white abdomen (Takeishi et al., 2021). In the Japanese archipelago, where CAS mice were introduced, a mixed pattern with light and dark ventral fur is present (Takeishi et al., 2021). However, the dorsal coat coloration has not been analyzed within the evolutionary dynamics of the house mouse.

Lai et al. (2008) found that dorsal coat coloration in the house mouse is negatively correlated with precipitation. The three major subspecies of house mouse tend to have darker coats in areas to which they spread in prehistory; in their homelands, which include desert areas, their fur is typically lighter. One explanation for the geographic variation in house mouse coat coloration is natural selection driving convergent evolution (Slábová and Frynta, 2007; Lai et al., 2008). Analyses of coat color-related genes have revealed that strong natural selection is linked to body color in small rodents (Hoekstra et al., 2004, 2005; Steiner et al., 2007; Mullen et al., 2009). However, the specific effects of natural selection on the genes responsible for geographic variation in house mouse coat color are unclear.

Two genes have been implicated in the evolution of coat color in mammals, particularly rodents: the agouti signaling protein gene (Asip) and the melanocortin 1 receptor gene (Mc1r) (Eizirik et al., 2003; Hoekstra and Nachman, 2003; Nachman et al., 2003; Hoekstra et al., 2004, 2005; Hoekstra, 2006; Fontanesi et al., 2006, 2010; Steiner et al., 2007; Wlasiuk and Nachman, 2007; Kingsley et al., 2009; Suzuki, 2013; Marín et al., 2018). The product of Mc1r is a receptor on the plasma membrane of melanin-producing cells (melanocytes), which penetrates the plasma membrane seven times. Eumelanin (the black pigment) is produced by the activation of melanocytes via the cell membrane receptor Mc1r, while pheomelanin (the reddish-brown pigment) is produced by the inactivation of melanocytes by binding of Asip, antagonist (or inverse agonist), to Mc1r (e.g., Voisey and Van Daal, 2002; Adan and Kas, 2003; Herraiz et al., 2021).

Takeishi et al. (2021) examined sequence variation (approximately 180 kb) in Asip, through the generation of 196 haplotypes by trimming the nuclear whole genome sequences of 98 mice (Fujiwara et al., 2022a). Haplotype groups of Asip were related to the variation of ventral fur color. By examining 500–1,000-bp sequences of the flanking gene at 10-kb intervals, we inferred the haplotype structure of the Mc1r-containing region (200 kb) in the house mouse, revealing an evolutionary episode of the species (Nunome et al., 2010; Kodama et al., 2015; Kuwayama et al., 2017). Furthermore, we examined sequence variation in the Mc1r coding region and revealed the demarcation of CAS into two geographic areas: the homeland with multiple diverse haplotype groups in Pakistan and northern India, and the newly inhabited area with a less-diverse haplotype group in southern India, Southeast Asia and southern China (Kodama et al., 2015). Comparison of the nucleotide sequences of the coding region of Mc1r (948 bp) indicated that one nonsynonymous substitution, c.302C>T, is associated with the newly inhabited geographic area (darker dorsal fur) (Kodama et al., 2015). Haplotype analysis of mice from the Japanese archipelago indicates insertions of short DOM-derived fragments, but their evolutionary significance is unclear (Nunome et al., 2010; Kuwayama et al., 2017; Isobe et al., 2018).

In this study, we extracted a 250-kb sequence containing Mc1r from 98 wild-derived specimens with publicly available genome sequences (Fujiwara et al., 2022a), and then evaluated the spatiotemporal population dynamics of M. musculus by genealogical analysis. We aimed to clarify the lengths and distribution patterns of CAS-, DOM- and MUS-derived fragments in the recombinant haplotypes, as well as the status of inter-subspecific hybridization. We also analyzed nonsynonymous substitutions in the 948-bp coding region of Mc1r and investigated their association with hair color variation. We thus sought to gain insights into the evolution of coat color diversity in the house mouse.

RESULTS

Neighbor-net network (250 kb)

Haplotype sequences of Mc1r chromosome segments (~255 kb) were prepared from the whole genome sequences of the 98 wild-derived house mice from Asia (Fig. 1). A neighbor-net network was constructed based on the 196 haplotypes (Fig. 2). Most haplotypes (n = 173) were assigned to clusters representing CAS and MUS, which comprise the collection sites where haplotype members originated. Eight haplotypes of mice from western Russia (Moscow) and northern Sakhalin (Okha), as well as the Indonesian islands of Java and Bali, were integrated into an independent cluster that was presumed to represent DOM, being consistent with the fact that these mice show a recognizable correlation with DOM in mitochondrial and nuclear genome sequence analyses (Li et al., 2021; Fujiwara et al., 2022a). The CAS and DOM haplotypes showed long branches and were located closer to each other than to the MUS cluster (Fig. 2). In the network, the remaining 15 haplotypes from China (n = 6), Japan (n = 6) and Russia (n = 3) were located at intermediate positions among subspecies-specific clusters, indicative of their recombinant origin.

Fig. 1.

Geographic distribution of haplotype groups of 250-kb sequences of the Mc1r and flanking regions in the house mouse, Mus musculus. Each colored circle represents a haplotype group of an individual mouse of the subspecies M. m. castaneus (CAS), M. m. domesticus (DOM) or M. m. musculus (MUS). Recombinant haplotypes are represented by types specified for a 42-kb sequence region containing Mc1r (partition 4, see text for details). Localities of haplotypes with the allele c.302T of Mc1r are denoted by ‘T’.

Fig. 2.

A neighbor-net network with the genome sequences of the Mc1r locus and its neighbors (~250 kb) trimmed from the whole genome data of 98 wild mice (Fujiwara et al., 2022a). Subspecies assignment based on mtDNA typing (Li et al., 2021) and whole genome sequence analysis (Fujiwara et al., 2022a) revealed three clusters representing the Mus musculus subspecies M. m. castaneus (CAS), M. m. domesticus (DOM) and M. m. musculus (MUS). Haplotypes not integrated into the clusters of the three subspecies were considered recombinants. Each color represents a country or group of countries from which haplotypes were recovered.

The CAS cluster was divided into four geographic groups, represented by the haplotypes from Pakistan/northern India (CAS-A), India (CAS-B), Bangladesh/Nepal/Sri Lanka (CAS-C) and southern India/Nepal/southern China/Vietnam/Indonesia (CAS-D); the first two groups correspond to the home ranges of CAS mice and the last two to the regions where CAS expanded in conjunction with prehistoric humans (Figs. 1 and 2). Two haplotypes from Ukraine (Donetsk) and Russia (Chita) were members of CAS-C; two haplotypes from Russian Far East (Khasan) and central Sakhalin (Poronaysk) were integrated into CAS-D.

We tentatively categorized the MUS cluster into five geographic groups based on geographic affinity, taking into account the clustering method previously reported for MUS mtDNA (Li et al., 2021). These groups are mainly distributed in Russia (MUS-A), western China (MUS-B), northern China (MUS-C), the Korean Peninsula (MUS-D) and the Japanese archipelago (MUS-E) (Figs. 1 and 2). The haplotype groups of MUS-D and MUS-E showed a close relationship, and one Korean haplotype was included in MUS-E.

We constructed neighbor-net networks with six partitioned sequences (~42 kb; partitions 1–6) (data not shown) and identified 24 recombinant haplotypes among the three major subspecies clusters. The haplotype structure (subspecies composition) was inferred for the 39 recombinant haplotypes (Table 1). Long stretches of DOM-derived segments were detected in the CAS and MUS territories (western China (Urumqi) and the Indonesian islands of Java (Lembang) and (Denpasar), indicative of recent hybridization events (Nunome et al., 2010; Kuwayama et al., 2017). Recombinant haplotypes of MUS-background mice from China, Japan and the Russian Far East included CAS-derived segments. Recombinant haplotypes with CAS-derived fragments from Hokkaido (Hakodate, Takikawa and Setana), northern Honshu (Yamagata) and the Russian Far East (Vladivostok) also contained DOM-derived segments.

Table 1. Assessment of subspecies types in six partitions of the recombinant-assigned haplotypes based on phylogenetic tree analysis
Haplotype codeLocality123456
1HS2400_1_TaitungChinaCASCASCASCASCASCAS/MUS
2MG0501_1_GuilinChinaMUSMUSMUSMUS/CASCASCAS
3MG0577_1_UrumqiChinaMUSMUSMUSMUSDOMDOM
4MG0577_2_UrumqiChinaDOMDOMDOMDOMMUSMUS
5MG0709_1_ChongqingChinaMUSMUSCASCAS/MUSMUSMUS
6MG0709_2_ChongqingChinaMUSMUSCASCAS/MUSMUSMUS
7MG0713_1_ZhenjiangChinaCASCAS/MUSMUSMUSMUSMUS
8MG0713_2_ZhenjiangChinaMUSMUS/CASMUSMUSMUSMUS
9MG0795_2_NingboChinaCASCASCASCAS/MUSCAS/MUSMUS
10MG0908_1_WuhanChinaCASCASCASCAS/MUSMUSMUS
11MG3044_1_TallinnEstoniaMUS/DOMMUSMUSMUSMUSMUS
12MG0417_2_MashhadIranMUSMUSMUS/CASMUSMUSMUS
13MG5135_1_NowshahrIranMUSMUSMUS/CASMUSMUSMUS
14MG5135_2_NowshahrIranMUSMUSMUS/CASMUSMUSMUS
15HS2323_1_HakodateJapanMUSMUSCASCASCASDOM/CAS
16HS2323_2_HakodateJapanMUSMUSCASCASCASDOM/CAS
17HS2445_2_TakikawaJapanMUSMUSCASCASCASDOM/CAS
18HS3534_2_YamagataJapanMUSMUSCASCASCASDOM/CAS
19HS4097_1_MisasaJapanMUSMUSCASCASMUSMUS
20HS4097_2_MisasaJapanMUSMUSMUS/CASCASMUSMUS
21HS4271_2_SetanaJapanMUSMUSMUSMUSMUSDOM/CAS
22HS4411_1_OdaJapanMUSMUSMUSMUSCASCAS/MUS
23HS4411_2_OdaJapanMUSCASCASCASMUSMUS
24HI134_1_LembangJava, IndonesiaCASDOMDOMDOMDOMDOM
25HI134_2_LembangJava, IndonesiaCASDOMDOMDOMDOMDOM
26HI115_1_DenpasarBali, IndonesiaDOMCASCASCASCASCAS
27HI115_2_DenpasarBali, IndonesiaCASDOMDOMDOMDOMDOM
28MG3010_1_North_CaucasusRussiaMUS/DOMMUSMUSMUSMUSMUS
29HS1411_1_KhasanRussian Far EastCASCASCASCASCASDOM
30HS1411_2_KhasanRussian Far EastCASCASCASCASMUSMUS
31MG3010_2_North_CaucasusRussiaMUSMUSMUS/CASMUSMUSMUS
32MG3025_1_VladivostokRussian Far EastDOMCASCASCAS/DOMDOM/MUSMUS
33MG3025_2_VladivostokRussian Far EastMUSMUSMUSMUS/CASCASDOM
34MG3026_2_BirakanRussiaMUSMUSCASCASCASCAS
35MG3004_1_ChitaRussiaCASMUSCASCASCASCAS
36MG3004_2_ChitaRussiaMUSCAS/MUSMUSMUSMUSMUS
37MG3045_1_PoronayskSakhalinDOMCASCASCASCASCAS
38MG3048_1_Yuzhno_SakhalinskSakhalinMUSMUSMUSMUSMUSCAS
39MG3048_2_Yuzhno_SakhalinskSakhalinMUS/CASCAS/MUSMUSMUSMUSMUS

We evaluated the four recombinant haplotypes from Hokkaido (Hakodate and Takikawa) and northern Honshu (Yamagata) by comparing nucleotide sequences. The four haplotypes of the MUS genetic background (Table 1) showed a similar patchwork pattern, harboring short stretches of CAS and DOM fragments of 3.4 to 60 kb (Supplementary Fig. S1).

Sliding-window analysis of nucleotide diversity throughout the chromosomal region

We performed a sliding-window analysis of nucleotide diversity (π) at 10-kb intervals throughout the 255-kb sequences (26 segments) in the CAS and MUS haplotype groups (Fig. 3A). The overall patterns of the sequences were similar among subspecies and among geographic groups. The resultant values of π varied considerably among the 26 segments, up to 0.01 in CAS and 0.005 in MUS. A specific chromosomal position in the 160–170-kb segment, a noncoding region located between the tubulin beta gene (Tubb3) and the FDCP 8 gene (Def8) (black arrow in Fig. 3A), showed significantly lower π values (0.002 in CAS and 0.0005 in MUS).

Fig. 3.

Nucleotide diversity (π) across the 250-kb region containing Mc1r and its neighboring genes in the haplotype groups of CAS and MUS (A). Values were determined using a 10-kb window. The approximate position of Mc1r is indicated by an asterisk. Arrows indicate particularly low π values in haplotype group D of CAS, compared with the other haplotype groups. Mean nucleotide diversity (and standard deviation) over the 250-kb region in the haplotype groups (B).

The region of Mc1r and its neighboring genes in the haplotype group CAS-D (Southeast Asia and southern China) exhibited a U-shaped pattern (Fig. 3A). The lowest level of nucleotide diversity (π = 0.0001) was in the transcription factor 25 gene (Tcf25) (red arrow in Fig. 3A), located immediately upstream of Mc1r.

The mean π values of the 26 segments were compared among regional haplotype groups (Fig. 3B). In CAS, the mean π values of the four haplotype groups tended to be higher than the mean π values of the five haplotype groups of MUS. In terms of inter-regional comparison, the mean π values of groups in the two newly inhabited areas, represented by mice from Bangladesh (CAS-C, π = 0.0042) and southern China (CAS-D, π = 0.0032), were lower than the mean π values of groups from the homeland of Pakistan/northern India (CAS-A, π = 0.0073) and India (CAS-B, π = 0.0059). In MUS, the values were lower in the Korean Peninsula (MUS-D, π = 0.0005) and the Japanese archipelago (MUS-E, π = 0.0001) compared with Russia (MUS-A, π = 0.0024), western China (MUS-B, π = 0.0020) and northern China (MUS-C, π = 0.0019).

Nucleotide sequence variation in the Mc1r coding region

The nucleotide sequence variation of the amino acid coding region of Mc1r (948 bp) was examined in 196 sequences by constructing a median-joining network (Fig. 4). We identified 33 haplotype sequences and 32 polymorphic sites, most of which (> 80%) were G to A or C to T mutations. The minor allele frequencies of the 20 nonsynonymous mutations were low (< 0.08) except for c.302C>T (0.3).

Fig. 4.

Median-joining network of the Mc1r coding region (948 bp) with 196 haplotype sequences of wild mice. The Mc1r coding sequences were assigned to the CAS, DOM and MUS subspecies based on the clustering patterns of the 250-kb haplotypes (Fig. 2) in non-recombinants and the clustering patterns of partition 4 in recombinants (Table 1). We further divided the CAS haplotypes into two groups: the “western” group represented by haplotypes from Pakistan and northern India, and the “eastern” group represented by haplotypes from the remaining territories. Bars on branches indicate nucleotide substitutions: black bars and red bars indicate the positions of synonymous substitutions and nonsynonymous substitutions, respectively. Circle sizes correspond to the number of sequences of the same haplotype. Haplotypes H1 and H5 are represented by CAS and DOM mice but also contain DOM (MG3056_2)- and CAS (HI372_1)-assigned sequences, respectively.

We assigned the Mc1r coding sequences to each of the three subspecies based on the clustering patterns of partition 4, which harbors Mc1r (Table 1). Haplotypes assigned to DOM and MUS were each assembled as one group; haplotypes assigned to CAS were geographically separated into western (Pakistan and northern India) and eastern (southern India, Southeast Asia and southern China) groups, approximately represented by the CAS-A/CAS-B and CAS-C/CAS-D haplotype groups, respectively (Fig. 4). Most haplotypes representing the eastern CAS group (H1–4) and DOM (H5–8) consisted of central haplotypes with their satellites, which had specific alleles at polymorphic sites in the Mc1r coding region: 51 (c.51T) and 302 (c.302T). H1 was recovered from multiple localities in the MUS territory, including the Japanese archipelago (e.g., Hakodate and Yamagata) and the Russian Far East (e.g., Vladivostok and Poronaysk) (Fig. 1). Notably, the H1 and H5 haplotypes were shared by DOM (MG3056_2) and CAS (HI372_1) mice, respectively.

We performed a phylogenetic inference of the 20-kb sequence of Mc1r and its neighboring region to assess the evolutionary origin of the c.302C>T mutation, which is shared by CAS and DOM. We tracked SNPs shared by haplotypes assigned to DOM, which revealed regions enriched in the DOM-specific SNPs that were heterogeneously distributed along the examined sequence (Fig. 5A). The SNPs were absent in the region surrounding c.302C>T (~5 kb; Region A in Fig. 5A), which belonged to CAS according to phylogenetic analysis (data not shown); this finding suggested past gene flow from CAS to DOM. We identified two short DOM SNP-enriched fragments in CAS-assigned haplotypes from Pakistan (HI175_2, HI261_1, and HI261_2) and India (HI185_1 and HI187_2) (1.9 kb, Region B in Fig. 5A) and another from northern Japan (HS2323_1, HS2323_2, HS2445_2, HS3534_2; 3.7 kb, Region C in Fig. 5A). Furthermore, median-joining networks based on the sequences of Regions B and C exhibited five and nine DOM-specific SNPs, respectively (Fig. 5B and 5C). Phylogenetic analysis by the neighbor-joining method revealed monophyletic groupings in Regions B and C with moderate bootstrap values of 85% and 98%, respectively (data not shown).

Fig. 5.

Distribution of DOM-specific SNPs on both sides of the 10-kb Mc1r flanking region on chromosome 8 of M. musculus (A). Horizontal box indicates one haplotype, whereas vertical bars above the box indicate DOM-specific SNPs. Two SNPs (c.51, c.302) characterizing the haplotypes of interest, H1 and H5 in Mc1r, are shown. The 20-kb interval shows a biased pattern of high and low frequencies of DOM-specific SNPs. The SNPs were rarely distributed in the chromosomal region immediately adjacent to Mc1r (Region A); instead, they were distributed mostly in adjacent regions. Several CAS sequences share DOM-specific SNPs: five from Pakistan (HI175_2, HI261_1, HI261_2), India (HI185_1, HI187_2; Region B), and four from northern Japan (HS2323_1, HS2323_2, HS2445_2, HS3534_2; Region C) had multiple DOM SNPs in fragments of a few kilobases. Median-joining networks of 196 haplotype sequences of Region B (ca. 1,900 bp, B) and Region C (ca. 3,400 bp, C). Positions of haplotypes related to the SNPs in the Mc1r coding region and introgressed DOM segments are indicated by colored arrowheads.

We estimated the generation time for the start of the hybridization (backcrossing) based on the lengths of fragments observed (Nunome et al., 2010; Kuwayama et al., 2017). The times for the initiation of backcrossing for the long (e.g., 25–40 kb) and short (e.g., 2–3 kb) DOM-derived fragments were 3,500–5,500 and 45,000–70,000 generations ago, respectively. The c.302C>T polymorphism, which may signal genetic interactions among subspecies in the distant past, was shared among some CAS and DOM individuals. RELATE analysis showed that the c.302C>T polymorphism originated in CAS 50,000 to 200,000 years ago (Supplementary Fig. S2). The time estimation for the c.302T allele of DOM was approximately 50,000 years ago. The RELATE analysis showed that CAS mice from Pakistan and northern India have fragments similar to those present in MUS mice.

Hair color lightness

The luminance of the dorsal and ventral fur color of 35 skin specimens, 12 from the CAS region and 23 from the MUS region, was measured using a spectrophotometer (Fig. 6, Supplementary Table S1). The lightness (L*) varied sequentially from 27.5 to 48.5, in which the dorsal color varied from dark brown (e.g., L* 30) to brown (e.g., L* 35), and then to yellowish brown (e.g., L* 40). The L* values of CAS individuals from Pakistan and northern India varied from 32.1 to 40.2. In contrast, L* values of individuals from the newly inhabited territory (southern India, southern China and Southeast Asia) were 28.7 to 33.1. L* values were low in MUS individuals from Russia (L* 29.5–37.0) and high in MUS individuals from western China (L* 42.9–48.4). The values varied substantially among mice from other regions of China (L* 27.5–44.2) and the Japanese archipelago (L* 28.9–38.5).

Fig. 6.

Quantitative measurements of dorsal lightness (L*) in 35 skin specimens from wild mice with available whole genome sequences (Fujiwara et al., 2022a). L* values of geographic groups of CAS (Pakistan/India, southern India/southern China/Vietnam) and MUS (Russia, western China, northern China, the Japanese archipelago). Genotypes (CC, CT, TT) at codon position 302 of the Mc1r coding region are shown. Photographs displaying band patterns of the dorsal hairs in stuffed specimens representing three different levels of lightness (L*: 30, 35 and 40) are presented. Schematic illustrations are used to depict the approximate hair band patterns for each level of lightness.

These variations in lightness reflected variation in the two visible bands of the agouti hair: basal and apical bands were black/gray and brown/yellowish brown, respectively. A hair with relatively short and dark brown and pale brown apical bands had lower lightness (e.g., L* = 30) and moderately high lightness (e.g., L* = 35), respectively. When the basal band was gray and relatively short and the overlay band was relatively long and yellow, lightness was substantially higher (e.g., L* = 40) (Fig. 6).

The associations of dorsal and abdominal hair color lightness (L*) with the c.302C>T nonsynonymous substitution in the Mc1r coding region were examined in the CAS (n = 12) and MUS (n = 23) datasets, respectively (Fig. 6; Supplementary Table S1). CAS mice showed significant differences in terms of correlations with the c.302C>T genotype in both dorsal hair color lightness (P = 0.019) and abdominal hair color lightness (P = 4 × 10−4). Mice with the C and T alleles tended to show lighter and darker hair coloration, respectively (Supplementary Fig. S3). Genotype showed a strong association between the home and new home regions. In MUS mice, the c.302C>T allele was significantly correlated with dorsal fur lightness (P = 0.020) but not with ventral fur color (P = 0.504, Supplementary Fig. S3); mice with c.302T tended to be darker on the ventral side. However, dorsal fur lightness (L*) varied from 27.5 to 48.4 in mice with the C/C homozygous genotype.

DISCUSSION

Geographic structuring and evolutionary episodes of M. musculus

In this study, we evaluated the evolutionary process of M. musculus, which shows substantial hair color variation, by analyzing a high-gene-density region encompassing the hair color-related gene Mc1r. We performed a genealogical analysis and assessed haplotype structure to infer past and present genetic interactions. The two Asian subspecies—CAS and MUS—exhibited several regional haplotype groups in a network analysis (Fig. 2). Haplotypes assigned to CAS were integrated into four haplotype groups with geographic affinity: Pakistan/northern India (CAS-A), India (CAS-B), Bangladesh/Sri Lanka (CAS-C) and Southeast Asia/southern China (CAS-D). The MUS haplotype groups were represented by two geographic groups: one localized in the northern region (MUS-A) covering a broad area of Russia, and the other mainly localized in China and its eastern areas (lineages MUS-I and MUS-II, respectively, in previous nuclear gene sequence analyses; Nunome et al., 2010; Kuwayama et al., 2017; Takeishi et al., 2021). The MUS-II haplotype groups were further divided into four geographic groups: westernmost China (MUS-B), northern and western China (MUS-C), the Korean Peninsula (MUS-D) and the Japanese archipelago (MUS-E) (Fig. 2). These findings support the notion that MUS underwent an eastward migration in prehistory via Russia and China; the migration in China involved a regionally and temporally phased process (Nunome et al., 2010; Suzuki et al., 2013; Kuwayama et al., 2017; Li et al., 2021; Takeishi et al., 2021).

The nucleotide variation profile provides insight into the eastward migration of MUS, revealing a substantial reduction in nucleotide diversity during movement to regions of the Korean Peninsula and Japanese archipelago (Fig. 3). Therefore, the ancestral lineages experienced strong population bottlenecks during their introductions into these regions through movements via sea and land. The Korean population consists of solely MUS-derived genetic elements, whereas the Japanese population is admixed with CAS, as demonstrated by whole genome sequence analysis (Fujiwara et al., 2022a).

Evolutionary implications of CAS and DOM fragments in the Japanese archipelago

CAS and MUS are predicted to have reached the eastern edge of the Eurasian continent in conjunction with the prehistoric development of agriculture. In this context, recombinant haplotypes were generated by secondary contacts between CAS and MUS (Table 1; Nunome et al., 2010; Kuwayama et al., 2017; Isobe et al., 2018). Short fragments of CAS, such as ~20 kb, are observed in the MUS genetic background in southern Japan and localities along the Yangtze River in China (e.g., Chongqing and Zhenjiang); longer fragments of CAS (up to 200 kb) are observed in northern Japan and easternmost Russia, possibly reflecting the timing of secondary contacts.

In addition to CAS fragments, short DOM-derived fragments (e.g., 20 kb) were detected in the genome of Japanese mice (Table 1), as previously reported (Nunome et al., 2010; Kuwayama et al., 2017; Isobe et al., 2018). Furthermore, we found DOM fragments of up to 2–5 Mb, suggesting the present-day introgression of foreign subspecies, as has been reported previously (Nunome et al., 2010; Kuwayama et al., 2017). In the present study, we detected very short DOM fragments (2 and 4 kb) by analyzing DOM-specific SNPs in a 20-kb chromosomal region near Mc1r (Fig. 5). These very short DOM fragments were embedded in CAS-assigned haplotypes: one from India and Pakistan, and the other from northern Japan, implying incomplete lineage sorting. Alternatively, these embedded fragments could be the result of inter-subspecies introgression from DOM to CAS, presumably ≥ 50,000 years ago. Such an ancient genetic interaction between subspecies is also supported by the age estimation of c.302C>T according to RELATE analysis (Supplementary Fig. S2). This suggests that the genome of M. musculus underwent gene flow between subspecies within its native habitat prior to its migration with prehistoric humans, resulting in the establishment of a complex genomic structure.

Factors shaping the geographic variation of the coat color of M. musculus

Although this study is preliminary because of the small number of individuals analyzed, it provides insights into the evolution of dorsal hair coloration in the house mouse. The lightness of dorsal coat coloration varied considerably both between and within subspecies, with L* values ranging from 27.5 to 48.5. This coat color variation was found to be a marked regional population difference within the CAS and MUS subspecies (Figs. 1 and 6). While our focus was on nonsynonymous substitutions in Mc1r, the only widely distributed variation that could potentially explain this interregional coat color variation was c.302C>T. In CAS, this substitution was found to be correlated with both dorsal and ventral coat color (Supplementary Fig. S3) and associated with the geographic boundaries of home and newly inhabited territory (Fig. 1). In MUS, c.302C>T was found to be correlated with dorsal coat color, but C/C homozygotes showed greater variation in lightness. Our results suggest the involvement of this nonsynonymous substitution as a causal mutation for regional dorsal coat color variation, but also indicate the potential influence of a bottleneck effect due to migration. Additionally, exploring the presence of other genes that could contribute to the observed regional differences in coat color may be necessary. Dorsal coat color variation in the house mouse was related to the proportion of bands of the agouti pattern (Fig. 6). Thus, the agouti gene Asip, which has a hair cycle-specific promoter that regulates the hair band pattern (Vrieling et al., 1994), is the most likely candidate. A correlation between evolutionary changes in Asip and the agouti band pattern has been observed in wild rodents (Hoekstra, 2006).

Nonsynonymous substitutions in Mc1r cause extreme variation in hair color, such as melanism (Nachman et al., 2003). In rodents and humans, these substitutions reportedly contribute to evolution by causing moderate variation in hair and skin color (Römpler et al., 2006; Steiner et al., 2007; Mullen et al., 2009; Deng and Xu, 2018). In rodents, the effect of the Mc1r polymorphism on pigmentation is under the control of an allele of Asip (Steiner et al., 2007). In humans, phenotypic variation in skin lightness, which involves Mc1r variants, increased in frequency among Asian populations because of natural selection as prehistoric humans spread across Eurasia (Deng and Xu, 2018). It remains unclear whether Mc1r c.302C>T in M. musculus is fully responsible for the variation in dorsal hair brightness. The amino acid substitution is in the first extracellular loop of the Mc1r receptor; the product of Asip, agouti, acts as an inverse agonist for this receptor (Patel et al., 2010). Furthermore, substitution of valine for alanine at codon 103, corresponding to codon 101 in mice, is responsible for dark red hair color in humans (Valverde et al., 1995). To understand the murine c.302C>T mutation, a more detailed analysis of wild mice is needed, possibly by determining the effect of replacing one amino acid in Mc1r on coat color, as reported in a study of the house mouse (Suzuki et al., 2020).

CONCLUSIONS

Analyses of sequence variation in haplotypes in Mc1r and the neighboring 250 kb enabled the assessment of the evolutionary dynamics of hair color-related genes, compared with neighboring genes. It also allowed genealogical analysis and provided information regarding secondary contacts. CAS and MUS retained several region-specific haplotype groups (Figs. 1 and 2), supporting the notion that M. musculus expanded its territory by stepwise eastward migration events (Li et al., 2021; Takeishi et al., 2021). We identified a short DOM fragment in the CAS genome (Fig. 5). This finding suggests either incomplete lineage sorting or gene flow among subspecies in the distant past (or both), providing insight into the genetic diversity of the house mouse genome. No signature of a selective sweep of sequence diversity at Mc1r was found, whereas the immediately upstream Tcf25 appeared to have undergone a selective sweep (Fig. 3).

This study uncovered a significant correlation between the nonsynonymous substitution c.302C>T in Mc1r and a darker dorsal coat color (Supplementary Fig. S3). The strong association between c.302C>T and both dorsal and ventral hair color in CAS highlights it as a possible responsible mutation, although it is important to consider the potential influence of a bottleneck effect caused by migration. In MUS, c.302C>T is significantly linked to the lightness of the back. However, even among mice with the same C/C homozygous genotype of c.302, marked differences in lightness value exist, suggesting the involvement of other genetic mutation(s) (Fig. 6). The association between the difference in band patterns in a single hair and the lightness of dorsal coat color indicates that Asip is a potential causal gene.

The mutants have a coat coloration adapted to their environment. In southern and southeastern Asia and northern and eastern Eurasia, the dorsal fur is dark; lighter fur is dominant in the arid regions of western China (Fig. 6). These regional populations with substantial morphological differences (including coat color) were once recognized as distinct subspecies (Schwarz and Schwarz, 1943; Marshall, 1998). Indeed, although the regional populations of M. musculus formed on very short evolutionary timescales, natural selection, bottleneck effects caused by migration, and historical genetic admixture have resulted in varying degrees of uniqueness with respect to genome composition.

MATERIALS AND METHODS

Phylogenetic analyses and population genetic assessment of Mc1r genome sequences

We used whole genome sequences deposited in the GenBank/EMBL/DDBJ database under the project name PRJDB11027 (Fujiwara et al., 2022a). The dataset was prepared from the whole genome sequences of 98 wild-derived house mice from Asia (Fig. 1), aligned to the reference mouse genome GRCm38 (mm10). Haplotype inference was performed with SHAPEIT4 (Delaneau et al., 2019) using the recombination map reported by Morgan et al. (2017). The datasets were trimmed to variant call format (vcf) and fasta files corresponding to the chromosome 8 region at 123,268,300–123,515,463, which includes the Mc1r region; we focused on a trimmed sequence of approximately 250 kb. Missing variants were not skipped; they were coded as ‘N.’ We added prefixes (1_ or 2_) to the names to designate haplotypes. The target region is characterized by the conservation of multiple genes around Mc1r.

The 196 sequences were aligned using MAFFT, an online multiple sequence alignment tool (Katoh et al., 2019). The resultant 196 haplotype Mc1r sequences with gaps (~255 kb final sequence length) were used to construct a neighbor-net network in SPLITS TREE, version 5.3.0 (Huson and Bryant, 2006). For the assessment of recombinant haplotypes, the 255-kb sequences were divided into six partitions (~42 kb each, partitions 1–6). MEGAX (Kumar et al., 2018) was used to construct neighbor-joining trees and 1,000 bootstrap replicates were performed to assess robustness at each node. Nucleotide diversity was measured throughout the 255-kb region using sliding windows at 10-kb intervals throughout the haplotype groups, as determined by the neighbor-net network.

For network construction, trimmed sequences of interest were aligned and compared using MEGAX; gaps were removed and median-joining networks were constructed using PopART (Bandelt et al., 1999). Nonsynonymous mutations were plotted on the branches of a network of the coding region of Mc1r (948 bp) by matching the aligned sequences to MEGAX.

Because the length of fragments incorporated by backcrossing decreases with each generation, fragment length can be used to infer the number of generations since hybridization occurred (Stephens et al., 1998). The probability P that a specific haplotype has not changed from its ancestor G generations ago is P = (1–r)G, where r is the recombination/mutation rate (the expression can be converted to G = –ln(P)/r) (Stephens et al., 1998). The overall rate of recombination in M. musculus, 0.52 cM/Mb (Jensen-Seaman et al., 2004), was considered. When introgressed fragments of 2 kb and 20 kb are observed, for example, using these estimates and ignoring mutation rates, its half-life (P = 0.5) is estimated to be 70,000 and 7,000 years, respectively, assuming a generation time of 1 year for wild mice. The time estimate could be underestimated because backcrossing does not necessarily occur in each generation.

We estimated the genealogy and divergence time of the Mc1r region using RELATE v.1.1.9 (Speidel et al., 2019). The input files were converted from vcf to haplotype format (haps) using the RelateFileFormats command with the ‘—mode ConvertFromVcf’ option in the RELATE package. In the process of generating these input files, non-biallelic SNPs were removed. The alleles of Mus spretus were regarded as the ancestral alleles. RELATE analysis was performed only on chromosome 8 containing the Mc1r region; the genome-wide genealogies and mutations were inferred using a germline mutation rate of 5.7 × 10−9 per bp per generation (Milholland et al., 2017) and an effective population size of 120,000. We next implemented the EstimatePopulationSize.sh script in RELATE to re-estimate the effective changes in population size over time. We set the generation time at 1 year and the threshold to remove uninformative trees at 0.5.

We estimated allele age by placing mutation events on the branches of reconstructed trees at variable sites, using the RELATE method to reconstruct local genealogies based on phased haplotype sequences and to reveal ancient contacts among populations (Speidel et al., 2019).

Measurement of coat color variation and statistical testing

An extensive set of wild-derived mouse specimens collected from areas across Eurasia was preserved in the National Institute of Genetics in Japan (now stored at the National Museum of Nature and Science, Tsukuba, Japan). We measured the lightness (L*) of the dorsal and ventral coat color of 35 stuffed specimens from 98 wild house mouse samples using a CM-700d spectrophotometer (Minolta Camera Co., Osaka, Japan) with a 6-mm-diameter window. Three measurement points were taken around the middle part of the dorsal midline. Three measurements were taken from random locations on the ventral side as performed previously (Takeishi et al., 2021).

To test the association between coat color and the c.302T mutation, we performed a linear regression analysis using the ‘lm’ function in R software. The C/C, C/T and T/T genotypes were labeled 0, 1 and 2, respectively; they were used as explanatory variables, assuming an additive genetic effect of the SNP.

ACKNOWLEDGMENTS

We are grateful to Kuniya Abe, Hidetoshi Ikeda, Nobumoto Miyashita, Mitsuo Nunome, Daiki Usuda, Hiromichi Yonekawa, Kazuo Moriwaki, Toshihiko Shiroishi, Kimiyuki Tsuchiya and Shin-ichiro Kawada for their support in conducting this study. We also thank Masafumi Hama, Hideo Ikawa, Hidetoshi Matsuzawa, Kenkichi Sasaki and Keiichi Yokoyama for assistance with collection of wild mice. We would like to express our gratitude to the two anonymous reviewers for their invaluable feedback and comments, which greatly helped to improve the quality and clarity of this paper. This study was supported by JSPS KAKENHI Grant Number JP18H05508.

REFERENCES
 
© 2023 The Author(s).

This is an open access article distributed under the terms of the Creative Commons BY 4.0 International (Attribution) License (https://creativecommons.org/licenses/by/4.0/legalcode), which permits the unrestricted distribution, reproduction and use of the article provided the original source and authors are credited.
https://creativecommons.org/licenses/by/4.0/legalcode
feedback
Top