Whole-genome sequencing analysis of wild house mice ( Mus musculus ) captured in Madagascar

In Madagascar, the house mouse ( Mus musculus ) is widely believed to have colonized with human activities and is now one of the most abundant rodents on the island. However, its genetic background at the genomic level remains unclear, and clarifying this would help us to infer the timing of introduction and route of migration. In this study, we determined the whole-genome sequences of five Madagascar house mice captured from an inland location in Madagascar. We examined the genetic background of samples by analyzing the mitochondrial and autosomal genomes. We confirmed that the mitochondrial genome lineages of collected samples formed a single clade placed at one of the most basal positions in the Mus musculus species. Autosomal genomic sequences revealed that these samples are most closely related to the subspecies M. m. castaneus (CAS), but also contain a genetic component of the subspecies M. m. domesticus (DOM). The signature of a strong population bottleneck 1,000–3,000 years ago was observed in both mitochondrial and autosomal genomic data. In a comparison with global samples of M. musculus , the Madagascar samples showed strong genetic affinity to many CAS samples across a wide range of Indian Ocean coastal and insular regions, with divergence time estimated as around 4,000 years ago. These findings support the proposition that the ancestors of these animals started to colonize the island with human agricultural activity and experienced a complex history during their establishment.


INTRODUCTION
House mice (Mus musculus) are small commensal mammals that spread worldwide along with human activities. They are thought to originate from the northern part of the Indian subcontinent (Boursot et al., 1993Din et al., 1996) and to have experienced explosive population size growth after the initial colonization. Three major subspecies of house mice have been recognized so far (Didion and de Villena, 2013;Fujiwara et al., 2022) . Previous studies analyzing colonization patterns of house mice have provided insights into the history of human activity, such as migration, spread of agriculture and trading (Prager et al., 1998;Duplantier et al., 2002;Britton-Davidian et al., 2007;Searle et al., 2009a;Suzuki et al., 2013Suzuki et al., , 2015Jing et al., 2014;Phifer-Rixey et al., 2018;Li et al., 2021;Fujiwara et al., 2022). Madagascar is the fourth largest island in the world. It separated from the African continent and from the Indian subcontinent around 160 and 80 million years ago, respectively (Torsvik et al., 2000). The island is now about 300 km from the coastal line of the African continent at its shortest distance. Inferences from linguistic (Dahl, 1951(Dahl, , 1988Adelaar, 1995), archaeological (Battistini and Verin, 1972;Dewar and Wright, 1993;Burney et al., 2004) and genetic (Hurles et al., 2005) evidence suggested that Malagasy people originate from both East Africa and Southeast Asia (in particular, Borneo). When humans first arrived on Madagascar is still an open and controversial issue. Although previous archaeological evidence has suggested that the first human settlement on the island occurred 1,500-2,000 years ago (MacPhee and Burney, 1991;Burney et al., 2004;Gommery et al., 2011), recent excavations have provided evidence that humans became active on the island earlier than this (Dewar et al., 2013;Hansford et al., 2018;Douglass et al., 2019). More recently, studies using human genome data showed that ancient Malagasy people migrated from Borneo approximately 1,000-3,000 years ago and from the east coast of Africa around 700-1,500 years ago (Brucato et al., 2016;Pierron et al., 2017). A large commercial network connecting Asia and the Mediterranean Sea along the coastal regions of the Indian Ocean was established at least 1,000 years ago (Vérin and Wright, 1999), and traders from the Arabian Peninsula also sailed to the coast of Africa, including Madagascar (Allibert, 1988;Liszkowski, 2000).
Rodents have successfully colonized Madagascar, and there are at least 23 species of rodents currently inhabiting the island (Duplantier et al., 2002). After Rattus rattus, M. musculus is the second most common rodent found in human settlements throughout the island (Goodman, 1995). Duplantier et al. (2002) have reported that the tail of Madagascar house mice is as long as the head and body, which is a typical trait of DOM (Orsini et al., 1983). The Madagascar house mouse has therefore been thought to be a member of DOM (Boursot et al., 1993). In contrast, previous studies analyzing partial regions of mitochondrial genomes (control regions, the cytochrome b gene and tRNAs) found that Madagascar house mice were genetically close to samples from Yemen, in the southern part of the Arabian Peninsula (Duplantier et al., 2002;Sakuma et al., 2016). The mtDNA analy-sis suggested that the Yemeni house mice form another potential subspecies, M. m. gentilulus (GEN), which is distinct from the other three major subspecies (Prager et al., 1998;Suzuki et al., 2013). The mtDNA sequences of Madagascar samples collected from different localities constituted a single cluster with Yemeni samples (Duplantier et al., 2002;Sakuma et al., 2016). Suzuki et al. (2013) further compared the mitochondrial sequence of Yemeni GEN with those of other wild house mice and found that the mitochondrial lineage of the Yemeni GEN was in the most basal position to M. musculus among the other M. musculus subspecies. However, a study of microsatellite variation in the nuclear genome suggested that Madagascar mice are genetically close to CAS (Hardouin et al., 2015). These incongruences make the origin of Madagascar house mice highly controversial.
Previous studies on the genetic background of the house mouse in Madagascar are primarily based on mtDNA or limited nuclear genome analysis. To increase the resolution of genetic analyses, we performed high-quality whole-genome sequencing of five M. musculus samples captured in Madagascar and combined the data with a previously published worldwide house mouse genomic dataset (Harr et al., 2016;Fujiwara et al., 2022). We revealed the genetic features of these Madagascar wild house mice and estimated the divergence time between the continental and Madagascar samples. Our study provides insights into the prehistoric migration patterns of house mice and humans in Madagascar.

RESULTS
Genetic diversity of Madagascar house mouse We analyzed five whole-genome sequences of captured M. musculus from Madagascar that were partly analyzed in a previous study (Sakuma et al., 2016). These Madagascar samples were jointly genotyped with 128 publicly available samples, including seven M. spretus samples (Harr et al., 2016;Fujiwara et al., 2022 The derived site frequency spectrum (DSFS) obtained from the Madagascar samples is shown in Fig. 1, using M. spretus to polarize the direction of mutations. The sites were classified into synonymous, nonsynonymous and noncoding sites (Fig. 1). The DSFS of Madagascar samples showed a deficiency of sites with low allele frequency, and an excess of sites with intermediate and high allele frequency. This pattern implies that the Madagascar populations underwent severe population decline or recent population bottleneck. We also observed a slight excess of nonsynonymous polymorphisms compared with synonymous polymorphisms in the singleton class, indicating the segregation of slightly deleterious mutations in low frequency polymorphisms.
Population structure To obtain an insight into the genetic features of Madagascar house mice, we first performed a principal component analysis (PCA) using 86,288,314 autosomal SNVs of 126 M. musculus samples (Fig. 2). In the PCA plot, three clusters on the vertices of a triangle correspond to MUS, CAS and DOM genetic components, and individuals lying between the vertices would be inter-subspecies hybrid samples (Fujiwara et al., 2022). Principal component (PC)1 and 2 represent the genetic differentiation between MUS and the others, and between DOM and the others, respectively. The Madagascar samples were positioned near the CAS cluster with a slightly admixed genetic component of DOM. There was little variation in PC scores among the Madagascar samples. The linkage disequilibrium (LD) pruned version of PCA was not considerably different from Fig. 2 (Supplementary Fig. S2). We also performed ADMIXTURE analysis to estimate ancestral genetic components in the Madagascar samples ( Supplementary Fig.  S3), assuming three (K = 3) and four (K = 4) ancestral populations. In both results, the Madagascar samples were found to have a major CAS ancestry and a minor DOM ancestry.
Next, we constructed an autosomal genetic tree using the neighbor-joining method with the M. spretus samples as the outgroup (Fig. 3). Only the samples that had a major subspecies component greater than 80% in the ADMIXTURE result (K = 3) were used in the phylogenetic analysis (see Methods for details). The Madagascar samples formed a single cluster within the CAS cluster. We also constructed a neighbor-net network ( Supplementary  Fig. S4), which had a similar pattern to the neighborjoining tree, showing that the Madagascar samples are within the CAS cluster. To obtain a clearer picture, we used the outgroup f 3 statistics, f 3 (MDG, X; SPR), where MDG, X and SPR represent Madagascar, non-Madagascar M. musculus and M. spretus samples, respectively. The results showed that individuals genetically close to the Madagascar samples (high f 3 values) were located around the coastal side of the South Asian regions and Indonesian islands (Fig. 4).
We further investigated the pattern of genetic admixture in Madagascar samples using f 4 statistics. Following Fujiwara et al. (2022), we used Indian (IND03 and IND04) and German (DEU01-DEU07) samples as the reference populations for CAS and DOM, respectively. We computed f 4 (SPR, CAS; DOM, MDG) and f 4 (SPR, DOM; CAS, MDG), obtaining 0.0054 (Z-score: 52.43) and 0.0066 (Z-score: 26.25), respectively. In both cases, the f 4 statistics were significantly positive values, suggesting that the Madagascar samples were genetically closer to CAS than to DOM, but with a non-negligible amount of gene flow with DOM.
Mitochondrial lineage analysis We investigated the genetic background of Madagascar samples using complete mitochondrial genomes. Phylogenetic trees based on maximum likelihood inference revealed that the five individuals from Madagascar formed a single cluster (Fig.  5A, Supplementary Fig. S5). The Madagascar samples were situated in the most basal lineage of all M. musculus mitochondrial haplotypes, but bootstrap support was not sufficiently high (0.64, Supplementary Fig. S5). The time to the most recent common ancestor (tMRCA) of Madagascar mitochondrial lineages was estimated using the BEAST v1.8.4 software and we obtained the age of 3,100 years ago (Fig. 5B).
Inference of past demographic history Pairwise sequentially Markovian coalescent (PSMC) analysis was performed to infer the demographic history. All Madagascar samples showed almost the same effective population size trajectories over the past 1,000,000 years ( Fig. 6). The ancestral population of Madagascar samples showed a rapid increase from 300,000 to 200,000 years ago, peaked around 100,000 years ago, and has been slowly declining since then over the Last Glacial Period. PSMC plots for representative individuals of CAS and DOM are also shown in Fig. 6. CAS and Madagascar populations showed similar population histories before about 50,000 years ago. The CAS population exhibited a rapid decrease in effective population size after 50,000 years ago. The Madagascar samples also showed a similar decline, but with a slower rate than the CAS population. We also performed multiple sequentially Markovian coalescent (MSMC) analysis using Madagascar samples to analyze the relatively recent effective population size (Fig. 7). The plot exhibits a historical bottleneck event for the Madagascar samples around 1,000-3,000 years ago.
Estimation of divergence time between populations To estimate when the Madagascar samples diverged from the other populations, we performed a cross-population MSMC analysis. We evaluated cross  coalescent rates between Madagascar samples and 13 other samples consisting of German DOM samples, Indian CAS samples and CAS samples from coastal and insular regions of the Indian Ocean. The latter refers to samples from India (Bhubaneswar, Hyderabad, Mysore), Bangladesh (Dhaka, Mymensingh), Sri Lanka (Colombo, Peradeniya) and Indonesia (Bali, Java). The divergence time between German DOM and Madagascar was approximately 225,000 years ago with a 95% confidence interval (CI) of 224,641-226,546. While an Indian CAS, which was sampled at an inland mountainous area, showed a divergence time of approximately 50,000 years ago (95% CI: 48,907-50,011) from Madagascar, the other CAS samples from the coastal and island regions of the Indian Ocean showed a much more recent (~4,000 years ago) divergence time from Madagascar samples (Fig. 8). As a control, we performed the analysis for two samples from Bali and found that they did not show the signature of population divergence ( Supplementary Fig. S6).

DISCUSSION
The house mouse is an animal that has now colonized a wide range of areas including remote islands. This creature has been living commensally with humans for at least 10,000 years (Cucchi et al., 2005) and is thought to have been introduced to islands, including Madagascar, by human activities over thousands or hundreds of years. Other than Madagascar, New Zealand (Searle et al., 2009a), Madeira (Gündüz et al., 2001) and the British Isles (Searle et al., 2009b) are known for their studies of the introduction of house mice to islands. In this study, we present genetic features of Madagascar house mice by comparing them with global samples and provide insights into how they have been introduced into and established on the island. However, we should make the caveat that this study was conducted using a limited number of samples captured in a small region of Madagascar, and these samples do not necessarily represent the entire Madagascar house mouse population. A large-scale sampling across the island would be desirable to draw a clearer picture in the future. We should also note that, although we showed population size trajectories inferred using PSMC and MSMC in Figs. 6 and 7, apparent effective population size may change with many factors, such as population structure, migration and admixture, and may not reflect true effective population sizes (Mazet et al., 2016). Indeed, estimated effective population size (N e ) values should be viewed as the inverse of coalescent rate in piecewise time.
Genetic background of the Madagascar wild house mouse samples Previous studies have shown that wild M. musculus exhibit a very large effective population size and nucleotide diversity compared to humans. The Madagascar house mouse samples analyzed in this study also showed a higher nucleotide diversity (0.31%) than humans (0.08-0.12%) (Perry et al., 2012;Prado-Martinez et al., 2013;Arbiza et al., 2014). However, given that the Madagascar samples were clustered with CAS, whose nucleotide diversity is around 0.74-0.79% (Geraldes et al., 2008;Phifer-Rixey et al., 2014;Harr et al., 2016), the relatively low nucleotide diversity of the Madagascar samples is probably related to the population bottleneck when they were brought to the island. The DSFS of Madagascar samples also agrees with the population bottleneck scenario. However, the ratios of nonsynonymous to synonymous mutations in Madagascar samples were comparable with those in CAS. The nonsynonymous/synonymous ratio is often interpreted as an indicator of the efficacy of natural selection against slightly deleterious mutations, because the effect of genetic drift should become stronger in small populations (Akashi et al., 2012). The observed pattern agrees with cases in humans and primates, where recent population size changes did not strongly affect the genetic load of individuals (Simons et al., 2014;Osada et al., 2015).
Based on the results of the PCA plot, ADMIXTURE, f 4 statistics, phylogenetic tree and phylogenetic network, we concluded that the nuclear genomes of Madagascar samples largely consisted of a CAS genetic component. This result is consistent with a previous microsatellite study (Hardouin et al., 2015). However, the analyses demonstrated that our Madagascar samples have experienced admixture events with DOM. A recent study of quantitative trait loci identified multiple regions causing hybrid male sterility in CAS and DOM hybrids (White et al., 2012). According to that study, a fairly high percentage of male individuals in the second-generation hybrids (F 2 ) exhibited phenotypes associated with infertility, indicating that gene introgression between CAS and DOM is difficult. Nevertheless, the Madagascar samples exhibit the genetic feature of CAS-DOM hybrids (DOM-like CAS), indicating that inter-subspecific hybridization is rare but possible.
Mitochondrial genome analysis indicated that the mitochondrial lineages of the Madagascar samples formed a monophyletic group with low nucleotide diversity and that these lineages have a recent origin, with a tMRCA of approximately 3,000 years ago. This pattern is consistent with the results of Duplantier et al. (2002), who analyzed samples from different locations in Madagascar, Fig. 6. PSMC plot of M. musculus. Green, blue and black lines represent the inferred effective population size transition of CAS, DOM and Madagascar (MDG) mouse populations, respectively. The x-axis represents years before the present, scaled by g = 1 and μ = 5.7 × 10 −9 , and the y-axis represents the number of the inferred effective population size. Lines with lighter colors represent 100 replications of the bootstrapping results. Fig. 7. MSMC plot of five Madagascar samples using 10 haplotypes. The x-axis represents years before the present, scaled by g = 1 and μ = 5.7 × 10 -9 . The y-axis represents the inferred effective population size of Madagascar house mice. except that Yemeni samples were also clustered with the Madagascar samples in Duplantier et al. (2002). The MSMC analysis showed that our Madagascar samples experienced a population bottleneck event about 1,000-3,000 years ago. These results suggest that M. musculus was first brought to the island of Madagascar during this period.
We showed that the mitochondrial lineage of Madagascar samples was placed in one of the most basal positions of the entire M. musculus species. However, the study by Duplantier et al. (2002), which analyzed 539 nucleotide sites in the D-loop of the mitochondrial genome, showed that the MUS clade was located at the basal position of all M. musculus. We created a phylogenetic tree with whole mitochondrial genomes and showed the mitochondrial genealogy with higher resolution than previous analyses, but the bootstrap value supporting the basal position was 0.64, which is still not sufficiently high to conclude with high confidence which mitochondrial lineage is the most basal.
Collectively, our study showed highly complex genetic features of the Madagascar samples. The nuclear genomes are mostly derived from CAS, but with a significant amount of admixture with DOM. In contrast, the mitochondrial lineage of Madagascar samples has diverged highly from both CAS and DOM lineages. These findings indicate that the genetic background of Madagascar house mice is unique among the other worldwide populations.
Divergence of Madagascar population from other CAS populations We examined which CAS samples were closely related to the Madagascar samples. Although the neighbor-joining tree and neighbor-net net-work showed that the Madagascar samples clustered with Indian and Nepalese samples, f 3 statistics and MSMC analysis showed that the CAS samples in a wide range of the coastal and insular regions of the Indian Ocean are genetically closer to the Madagascar samples than the inland Indian samples. Considering that the inclusion of admixed samples to tree and network constructions potentially results in a distorted pattern, we concluded that the results of f 3 statistics and MSMC were likely to be more reliable. In particular, the model of MSMC assumes that the coalescent time of genomic fragments between chromosomes differs across the genome and so the estimated divergence time should be more robust to gene flow and admixture events. The relatively recent divergence time (~4,000 years ago) between the Madagascar and the Indian Ocean coastal and insular populations suggests that the migration of the Madagascar population coincided with human activity in the Neolithic period.
Notably, there was a small but clear increase in the relative cross coalescent rate (rCCR) between the DOM-Madagascar populations around 50,000-70,000 years ago (Fig. 8). This pattern is likely due to an admixture event between the ancestral population of German DOM and the Madagascar population. However, it should be noted that the peak timing (50,000-70,000 years ago) does not necessarily represent the actual timing of the admixture event. Since we used German DOM samples as representatives of DOM, if an unsampled or extinct DOM population was the source population, the peak timing would instead represent the divergence time between the ancestral population of German DOM and the unsampled/extinct DOM. Unfortunately, our dataset did not cover a large number of DOM samples, and we were not able to examine the timing of historical admixture events, if they existed.
Possible history of the Madagascar wild house mouse samples The ancestors of present-day Malagasy people are thought to have dual origins on the Indonesian island of Borneo in Southeast Asia and on the east coast of Africa, and wild house mice are thought to have been introduced commensurately with the migration of these human ancestors who arrived on the island of Madagascar. Here, we would like to discuss the potential migration time and route of Madagascar house mice to the island.
First, we consider when the ancestors of the Madagascar house mouse were introduced onto the island. Although the rate of mitochondrial molecular evolution, nuclear genome mutation rate and generation time in wild house mice are controversial issues, and these assumptions make the estimated timing highly vulnerable, our analyses consistently indicate a strong population bottleneck around 1,000-3,000 years ago. Note that, in principle, population admixture would not create a spurious signal of a population bottleneck. However, the timing of the bottleneck is compatible with the generally accepted date of arrival in Madagascar of Austronesian-speaking peoples from the Indonesian Islands (BCE 300-CE 500) (Dewar and Wright, 1993;Burney et al., 2004;Hurles et al., 2005). Furthermore, cross-population MSMC analyses indicate that the divergence between the Madagascar samples and those from the coastal and insular areas of South and Southeast Asia surrounding the Indian Ocean occurred approximately 4,000 years ago. Since commensal animals, such as house mice, would not suddenly expand their population size without an appropriate type of human activity like agriculture, we suggest that the Madagascar house mice migrated to the island at the same time as or soon after the first Austronesianspeaking farmers arrived on the island.
Second, it is difficult to answer the question of where the ancestors of the Madagascar samples migrated from. Because our f 3 and MSMC analysis indicated that the Madagascar samples have genetic affinities with many samples from coastal and insular regions of the Indian Ocean, it would be difficult to precisely determine their origin. In addition, our samples did not cover Yemen, where previous mitochondrial studies have suggested an origin, or Borneo, where Austronesian-speaking peoples are supposed to have migrated from. The original homeland of the Madagascar samples therefore still remains an open question. Another interesting question is whether there were multiple introductions of house mice to Madagascar. Previous mitochondrial analyses of rats from multiple locations in Madagascar indicated that there is considerable genetic diversity within Madagascar, originating from multiple sites in the Indian Ocean (Tollenaere et al., 2010). In this study, we show that Madagascar house mice possess both CAS and DOM genetic traits, which strongly supports the multiple origin scenario. If samples can be collected from multiple sites throughout Madagascar, a more definitive view of Madagascar house mice may be obtained in the future.
In summary, this study determined the whole-genome sequences of five Madagascar wild house mice and analyzed their genetic backgrounds. Until now, mice from Madagascar have been treated as DOM or GEN, but we showed that their genetic features are mostly derived from CAS with a non-negligible amount of introgression from DOM. In addition, mitochondrial and nuclear genome data indicated that they experienced a strong population bottleneck approximately 1,000-3,000 years ago, coinciding with the arrival of Austronesian farmers in Madagascar. In the future, mouse samples from the Middle East and Borneo, as well as more samples from across Madagascar, should clarify the in-depth genetic background and history of the Madagascar mouse population. Overall, we have revealed unique and highly complex genetic features of Madagascar house mice and provided insights into the evolutionary history of rodents that have successfully migrated to Madagascar, which has only been partially understood.

MATERIALS AND METHODS
Biological samples and genomic data Five specimens of the Madagascar wild house mouse were collected in the Parc Botanique et Zoologique de Tsimbazaza, Madagascar and nearby areas, from which a partial mitochondrial genome and coat color-controlling gene were sequenced in the study of Sakuma et al. (2016). The other mitochondrial and nuclear genomic sequence data of M. musculus as well as the western Mediterranean mouse (M. spretus) were obtained from previous studies (Harr et al., 2016;Li et al., 2021;Fujiwara et al., 2022). These data were downloaded from the DDBJ (PRJDB11027) and the European Nucleotide Archive (PRJEB9450, PRJEB11742, PRJEB14167, PRJEB2176 and PRJEB11897). The reference complete mitochondrial sequence of M. spretus (NC_025952) was also used for the mitochondrial genome analysis. See Supplementary Table S1 for detailed information on the samples used in this study.
Mapping genomic reads and variant calling For the five Madagascar samples, 150-bp-length paired-end reads were sequenced using the DNBSEQ platform. The reads were quality-checked using the FastQC v.0.11.9 tool (https://www.bioinformatics.babraham.ac.uk/projects/ fastqc/). All cleaned reads were mapped to the house mouse reference genome sequence GRCm38 (mm10) using the bwa-mem v.0.7.17-r1188 (Li and Durbin, 2009) algorithm with "-M" option and piped to the Samblaster v.0.1.26 (Faust and Hall, 2014) program with "-M" option to mark PCR-duplicated reads. For the data from Harr et al. (2016), we used samples that had greater than 20-fold median coverage. We used the GATK4 Haplo-typeCaller (McKenna et al., 2010) function with "-ERC GVCF" to perform variant calls. The genomic variant call format (.gvcf) files of each sample were then merged to a jointly called genotype file using GATK4 Genom-icDBImport and GenotypeGVCFs functions. We next conducted the GATK4 Variant Quality Score Recalibration (VQSR) process, which applies the machine learning method to determine whether our raw variants are true or false positives. For VQSR, we downloaded the known single nucleotide variant (SNV) dataset of "mgp. v3.snps.rsIDdbSNPv137.vcf.gz" from the Sanger Institute web server (ftp://ftp-mouse.sanger.ac.uk/REL-1303-SNPs_ Indels-GRCm38/) as a training dataset. Hard-filtered SNV data with parameters "QD < 2.0, FS > 60.0, MQ < 40.0, MQRankSum < -12.5, and ReadPosRankSum < -8.0" were also used as a training dataset. After VQSR, we used variants that passed within the 90% tranche (90% acceptance in all reliable training SNV datasets) for downstream analyses. The autosomal SNVs were then filtered using genomic mappability scores to restrict our analyses to non-repetitive regions. To calculate mappability scores of the reference genome sequence, we ran GenMap v.1.3.0 software (Pockrandt et al., 2020) with options "-K 30" and "-E 2" and used positions with the score of 1 (a 30-mer starting from a unique position in the genome, allowing for two mismatches) in downstream analyses. All samples were examined for kinship using KING v.2.2.6 software (Manichaikul et al., 2010) with the option "-kinship". None of the samples from Madagascar showed a detectable kinship relationship. A total of 133 samples of mice, including five novel samples from Madagascar, were used in the following analyses.
Because the phenotypic records for sexes of Madagascar samples were not complete and may not be accurate, we determined the sexes of samples using the genomic sequence data. We calculated the read coverages on the sex chromosomes, and used the "depth" command of the samtools program to count the coverage of each sample in non-pseudoautosomal sites of the X and Y chromosomes that passed the mappability filter. The ratio of X to Y chromosome coverages exhibited a clear bimodal distribution, whose modes were 1.03-1. 09 and 148.24-268.16. Three out of five samples were judged as males (Supplementary Table S3). Synonymous and nonsynonymous SNVs were annotated with the house mouse gene annotation data version GRCm38.101 (ftp://ftp.ensembl. org/pub/release-101/gtf/mus_musculus/) using the SnpEff and SnpSift programs (Cingolani et al., 2012a(Cingolani et al., , 2012b.

Mitochondrial lineage analysis
The mitochondrial genome sequences of Madagascar samples were recon-structed by de-novo assembly software, Novoplasty v.4.2.1 (Dierckxsens et al., 2017), using the house mouse mitochondrial genome NC_005089 as a seed sequence. The mitochondrial sequences of all samples were aligned using MUSCLE (Edgar, 2004a(Edgar, , 2004b implemented in MEGA7 (Kumar et al., 2016). All D-loop regions and gapped sites were removed, resulting in 16,038 bp length. To construct the maximum likelihood (ML) tree of mitochondrial genomes, we used IQ-TREE v.1.6.12 (Nguyen et al., 2015). The best substitution model, "TIM2 + F + R3", was selected using the ModelFinder (Kalyaanamoorthy et al., 2017) function implemented in IQ-TREE. Bootstrapping values were computed with 1,000 replications.
The divergence time among mitochondrial genomes was estimated using BEAST v1.8.4 (Drummond and Rambaut, 2007) with the HKY + G substitution and the strict clock models (Li et al., 2021). The Markov chain Monte Carlo simulation was run for 10 million steps including 10% burn-in steps, and samples were recorded for every 10,000 steps. Tracer v1.6 software (Rambaut et al., 2018) was used to assess the convergence of the chains and ensured effective sample size (ESS) values above 200 for most parameters. The trees were summarized using TreeAnnotator v1.8.4 software (http://beast. community/treeannotator) with the settings "Maximum clade credibility tree" and "Mean heights" and were displayed using FigTree v1.4.3 software (http://tree.bio.ed.ac. uk/software/figtree/). The time-dependent evolutionary rates of mtDNA (Ho et al., 2011) were considered as previously described (Li et al., 2021). Evolutionary rates of 2.4 × 10 -8 substitutions/site/year and 1.1 × 10 -7 substitutions/site/year were used for the older (before 100,000 years ago) and younger divergences (after 20,000 years ago), respectively.
Population structure analysis of the Madagascar house mouse We filtered our variant data to exclude indels and multiallelic SNVs using VCFtools. The filtered autosomal SNVs were then converted to a PLINK v.1.9 (Purcell et al., 2007) format. We polarized a mutation direction of each SNV using M. spretus as the outgroup species and calculated DSFS. In general practice, the SNVs in LD are excluded from population structure analysis. However, because the house mouse samples were genetically highly structured at the subspecies level, and excluding these SNVs would result in a very small number of SNVs for the downstream analysis, we performed PCA with and without LD pruning but presented the results without the LD pruning as primary results. PCA was performed using the smartpca program implemented in Eigensoft v.7.2.1 (Patterson et al., 2006) using the default parameter settings, except that we did not remove outlier samples automatically.
A neighbor-joining tree was reconstructed using a pairwise distance matrix generated from autosomal data (Saitou and Nei, 1987). Our previous study (Fujiwara et al., 2022) showed that samples with a high proportion of admixture could considerably distort the pattern of divergence among subspecies. In this study, we constructed phylogenetic trees using only those samples in which ancestry to a particular subspecies accounted for more than 80% in the ADMIXTURE analysis (K = 3). We computed all pairwise identity-by-state (IBS) distances using PRINK software with the "-distance square 1-ibs" option (Purcell et al., 2007). The tree was visualized using the Ape package in R (Paradis et al., 2004). Using the same samples, a Neighbor-Net tree (Bryant and Moulton, 2004) was computed using the phangorn package (Schliep, 2011) with gdsfmt, SeqArray and SNPRelate packages (Zheng et al., 2012(Zheng et al., , 2017 in R. We computed f 3 and f 4 statistics using AdmixTools v.7.0 (Patterson et al., 2012). The f 4 statistics were computed using the qpDstat program with "f4 mode" and "printsd" options. The Indian CAS (IND03 and IND04) and German DOM (DEU01-DEU07) samples were used as reference populations. These samples showed the least amount of admixture with other subspecies based on the f 4 statistics in the previous study (Fujiwara et al., 2022). The configuration of the f 4 statistics is represented as f 4 (A, B; C, D), where A to D represent each population. The outgroup f 3 statistics were computed by the qp3pop program of AdmixTools using the "outgroupmode" option. In the outgroup f 3 -mode, the statistics of f 3 (A, B; C), where C represents an outgroup population, show the genetic drift from C to the common ancestor of A and B and represent the genetic relatedness between the populations A and B.
Demographic inference Pairwise sequentially Markovian coalescent (PSMC) (Li and Durbin, 2011) analysis was performed for all five samples from Madagascar. To create the input file for PSMC, we used the "mpileup" command of samtools with the "-C" option, and used vcfutils.pl program in the PSMC package with the "-D 60 -d 10" option, to obtain the consensus autosomal genome sequence with GRCm38 (mm10) as the reference genome. The PSMC was conducted using the options "-N25 -t15 -r5 -p '1*4 + 25*2 + 1*4 + 1*6'". We performed 100 bootstrap replicates to show the CIs of piecewise estimated N e . Generation time (g) was assumed to be 1 year (Bronson, 1979;Suzuki et al., 2004;Geraldes et al., 2008), considering mating in the wild environment, and the autosomal mutation rate (μ) was assumed to be 5.7 × 10 -9 per site per generation (Milholland et al., 2017).
MSMC (Schiffels and Durbin, 2014) and its second version (MSMC2: https://github.com/stschiff/msmc2) (Malaspinas et al., 2016;Schiffels and Wang, 2020) were used to estimate N e changes and population separation history. Phased haplotype sequences, which were inferred using ShapeIt4 software (Delaneau et al., 2019), were used as input data for MSMC/MSMC2. We performed phasing of 126 house mouse samples, excluding seven M. spretus samples. The "Mapping Data for G2F1 Based Coordinates" (Liu et al., 2014) from "Mouse Map Converter (http://cgd.jax.org/mousemapconverter/)" was downloaded and the recombination rates in the files were used for ShapeIt4 and MSMC/MSMC2. To estimate the population separation history, we used two haplotypes from each sample to calculate the rCCR for given population pairs (Schiffels and Durbin, 2014). The half value of rCCR (i.e., rCCR = 0.5) is assumed as the time when the two populations separated. For bootstrapping, the original input data were cut into blocks of 5 Mbp each, which were randomly sampled to create a 3-Gbp-length pseudo genome. A total of 20 pseudo genomes were used for the bootstrapping. To visualize the MSMC/MSMC2 results, we used the same g and μ as the PSMC analysis.
Data Archiving The short-read whole-genome sequencing data generated in this study have been submitted to the DDBJ BioProject database (https://www. ddbj.nig.ac.jp/bioproject/) under the accession number PRJDB11969. The complete mitochondrial genome sequences are submitted to the DDBJ database under the accession numbers LC644158-LC644162.
We thank two anonymous reviewers for helpful comments. This work was partly supported by MEXT KAKENHI (grant 18H05511 to N. O. and grant 18H05508 to H. S.). We would like to express our gratitude to Dr. Chihiro Tanaka, who belongs to the Animal Care and Exhibition Section, Yagiyama Zoological Park, Sendai, Japan, for her support of the sample collection project. We would also like to express our deepest gratitude to Drs. Kimiyuki Tsuchiya and Hajanirina Ramino for sample collection in Madagascar.