Population structure and association analyses of the core collection of hexaploid accessions conserved ex situ in the Japanese gene bank NBRP-Wheat

In this study, we investigated the genetic diversity and population structure of the core collection of hexaploid wheat accessions in the Japanese wheat gene bank NBRP-Wheat. The core collection, consisting of 188 accessions of Triticum aestivum , T. spelta , T. compactum , T. sphaerococcum , T. macha and T. vavilovii , was intensively genotyped by DArTseq markers and consisted of 20,186 SNPs and 60,077 present and absent variations (PAVs). Polymorphic markers were distributed in all chromosomes, with a tendency for smaller numbers on the D-genome chromosomes. We examined the population structure by Bayesian clustering and principal component analysis with a general linear model. Overall, the core collection was divided into seven clusters. Non-admixture accessions in each cluster indicated that the clusters reﬂect the geographic distribution of the accessions. Both structure analyses strongly suggested that the cluster consisting of T. spelta and T. macha is out-grouped from other hexaploid wheat accessions. We performed genome-wide association analysis pilot studies for nine quantitative and seven qualitative traits and found marker-trait associations for all traits but one, indicating that the current core collection will be useful for detecting uncharacter-ized QTLs associated with phenotypes of interest.


INTRODUCTION
Along with maize and rice, wheat (Triticum spp.) is one of the most important staple crops that have supported human civilization. Hexaploid wheat (T. aestivum L.) originated from hybridization between domesticated tetraploid wheat (T. turgidum L.) and Aegilops tauschii Coss (Kihara, 1944;McFadden and Sears, 1944). Accumulated archeological and paleontological evidence indicates that the hybridization event occurred either south or west of the Caspian Sea approximately 8,000 years ago (Nesbitt and Samuel, 1996;Salamini et al., 2002;Giles and Brown, 2006). Today, hexaploid wheat is cultivated widely in subarctic, temperate and subtropical regions of both hemispheres, from 67°N in Norway, Finland and Russia to 45°S in Argentina, and accounts for more than 30% of the global cereal cultivation area (220 million ha in 2016, http://faostat.fao.org/). The production of wheat has increased since the release of modern high-yielding varieties and application of chemical fertilizers in the 1950s. The great success in raising wheat production has been referred to as a "green revolution". Today, more than 700 million tonnes of wheat is harvested annually (749 million tonnes in 2016, http://faostat.fao. org/). However, the increase in wheat production has plateaued in recent decades and there is an urgent need to promote sustainable wheat production to meet the demands of the increasing world population amid global climate change (McCouch et al., 2013). Diverse genetic resources are essential for breeding new cultivars. The landraces cultivated globally are especially important as a genetic resource because they have adapted to various environments (Harlan, 1975;McCouch et al., 2013). The National Bioresources Project-Wheat (NBRP-Wheat), launched by the Japanese government in 2002, collects, stores and supplies wild species, landraces and experi-mental strains of wheat and related species (currently providing ca. 12,000 accessions). The collections of wild species and landraces derived from several expeditions since Dr. Hitoshi Kihara's first expedition in 1955 are currently available rarely at the collection sites.
Association mapping is a method of detecting quantitative trait loci (QTLs). This approach is based on correlations between genotypes and phenotypes, and is additionally affected by the structure of linkage disequilibrium (LD) within a test population (Buckler and Thornsberry, 2002;Zondervan and Cardon, 2004). In addition to QTL analysis, which is based on phenotypic and genotypic polymorphisms within a bi-parental mapping population, association mapping is also applicable to a much larger gene pool and a more diverse genetic structure (Remington et al., 2001). Whole-genome association mapping has been applied to common wheat for several agronomic traits (Breseghello and Sorrells, 2006;Crossa et al., 2007;Neumann et al., 2011;Le Gouis et al., 2012;Edae et al., 2014;Wang et al., 2014;Zanke et al., 2014Zanke et al., , 2015Zanke et al., , 2017Li et al., 2016). However, these studies focused on a common wheat (T. aestivum) population but not on other hexaploid Triticum species.
In this study, we provide intensive genotype information for the NBRP hexaploid wheat core collection to elucidate its genetic diversity and population structure, which is a prerequisite for association studies. We performed genotyping on the core collection lines using a DArT marker system (Jaccoud et al., 2001). The core collection includes modern cultivars frequently used for breeding in Japan, and landraces cover the geographical distribution and taxonomic groups of hexaploid wheat. Therefore, the core collection is a comprehensive hexaploid wheat collection in terms of systematics of the genus Triticum, i.e., it represents not only common wheat but also other hexaploid wheat species. We aimed to test the applicability of the core collection for association analyses of quantitative and qualitative traits.

MATERIALS AND METHODS
Plant materials and growth conditions Plant materials used in this study are the collection of 188 accessions of hexaploid wheat (Table 1), comprised of T. aestivum (T. aestivum ssp. aestivum L., 160 accessions), T. spelta (T. aestivum ssp. spelta (L.) Thell., 12 acces- vavilovii Jakubz., two accessions) and synthetic hexaploid wheat (one accession, W7984). The synthetic wheat W7984 (Altar84/Aegilops tauschii (219) CIGM86.940) is the parental accession of the mapping population distributed by the International Triticeae Mapping Initiative (ITMI). In this report, we used nomenclature according to the National BioResource Project (NBRP)-Wheat catalogue. Detailed information on accessions is available on the NBRP-Wheat database KOMUGI (http:// www.shigen.nig.ac.jp/wheat/komugi/). The core collection includes both landraces and modern cultivars, and they were selected from the NBRP-Wheat hexaploid wheat collection based on their taxonomic and geographical distribution. In this paper, we refer to the accessions with evident breeding pedigrees as cultivars. Seeds were sown in January, 2011 and the seedlings were transplanted to pots (one plant per pot). The pots were placed in a glasshouse without air conditioning at Yoshida Campus, Kyoto University, Japan (N35.031, E135.79).
Phenotyping We scored nine quantitative traits for every accession: heading time (HT), flowering time (FT), maturation time (MT), number of spikelets per spike (SN), spike length (SL), spike density (SD), culm length (CL), 100-grain weight (HGW) and number of tillers (TN); and seven qualitative traits: wax-like deposits on culm (WC), ochrea (WO) and spike (WS), anthocyanin pigmentation on coleoptile (AC) and seedling (AS), glume hair (GH), and awnedness (AWN). HT, FT and MT are expressed as the number of days after November 1 st . SD is obtained by dividing SN by SL. CL is the value of the longest culm of each individual. Every quantitative trait was measured for three individuals per accession and their means were used for analyses. The quantitative traits were rated on a three-point scale. Wax-like deposits and anthocyanin pigmentation on each part were evaluated by naked-eye observations. We also scored glume hair and awnedness on three-point scales: hairless, short-hairy or hairy; and awnless, short-awn or long-awn, respectively.
DNA extraction and genotyping Total DNA was extracted from young leaves of individual plants using a DNeasy Plant Maxi Kit (Qiagen Tokyo, Japan) according to the manufacturer's protocol. DArTseq marker genotyping, a genotype-by-sequencing method, was carried out by Diversity Arrays Technology, Yarralumla, Australia. The DArTseq marker system generates two types of data: single-nucleotide polymorphisms (SNPs) and presence and absence variations (PAVs). We removed SNPs with call rates less than 0.8 or with hetero fractions less than 0.05 in the ensuing analyses. The marker system provides a pair of 69-bp sequence (tag) information; DNA polymorphisms therein were recognized as alleles. To select high-confidence markers and to reveal on which chromosome the markers were located, we performed a BLAST search (Zhang et al., 2000) for tag sequences on the IWGSC RefSeq ver. 1.0 (https://wheat-urgi.versailles. inra.fr/Seq-Repository) (International Wheat Genome Sequencing Consortium [IWGSC], 2018). We adopted only polymorphic markers with one of the tag sequences matched perfectly to a single position on a chromosome.
Statistical analyses Polymorphism information content (PIC) values were calculated for each marker according to Roldán-Ruis et al. (2000), as follows: where PIC i is the PIC of marker 'i', f i is the frequency of an allele and 1 -f i is the frequency of another allele. Missing data were excluded from calculations. We estimated the population structure by Bayesian clustering using the program ADMIXTURE 1.3 (Alexander et al., 2009). To estimate the optimal value of K, which describes the number of subpopulations making up the total population, we calculated a 10-fold cross-validation procedure for each K value from 1 to 15. Principal component analysis (PCA) was also performed to estimate population structure using the pipeline implemented in the software TASSEL 5.0 (Bradbury et al., 2007). Missing data were imputed to the mean score for each locus (Jolliffe, 2002;Reich et al., 2008).
Inference on associations between genotypes and phenotypes was made using the pipeline implemented in TASSEL 5.0. We removed rare polymorphic markers that appeared in less than 5% of the accessions (PIC < 0.095) from analyses and used principal components (PC1 to 5) obtained from PCA to infer population structures. We performed analyses with an eigenvalue decomposition of the covariance matrix. We used a general linear model (GLM) for estimation of regression between genotypes and phenotypes. The permutation analysis was run 1,000 times.

RESULTS
Genotyping of NBRP-Wheat hexaploid wheat core collection by the DArT marker system The DArTseq genotyping system was applied for 188 accessions. We could not obtain genotyping data from one common wheat accession (6 × 164: T. aestivum 'Nanbukomugi'), and obtained less than 80% data from one T. compactum accession (6 × 141: KU-9873). We removed these two accessions from the ensuing analyses. We obtained 34,234 SNP data sets by DArTseq genotyping. We removed lowconfidence SNPs with call rates less than 0.8 or with heterozygous fractions greater than 0.05 from the SNP data sets. The mode of the frequency of missing data was 0.00-0.05 ( Supplementary Fig. S1A). A total of 20,186 SNPs were selected and subjected to BLAST searches against IWGSC RefSeq ver. 1.0. A total of 13,394 tag sequences matched to single regions on the reference  Fig. S1B). BLAST searches against IWGSC RefSeq ver. 1.0 resulted in the matching of a total of 20,523 tag sequences to unique reference sequences. We obtained 6,687, 7,484, 5,899 and 453 PAV tags located on genomes A, B, D and one pseudomolecule composed of unanchored scaffolds, respectively ( Table  2). The PAVs assigned to each chromosome ranged from 473 on chromosome 4D to 1,304 on chromosome 2B ( Table  2). The average PIC values of PAVs for each chromosome ranged from 0.116 on chromosome 4D to 0.266 on chromosome 6B (Table 2). A total of 2,326 SNPs and 3,084 PAVs matched to at least one gene model sequence (Table 3) (IWGSC RefSeq annotation ver. 1.1 https://wheat-urgi.versailles.inra.fr/ Seq-Repository/Annotations). Due to the presence of multiple tag sequences in a gene model, we found a final sequences. We obtained 5,596 SNPs located on the A genome, 5,820 on the B genome, 1,768 on the D genome, and 210 on one pseudomolecule composed of unanchored scaffolds (chrUn, Table 2). The number of SNPs located on each chromosome ranged from 91 on chromosome 4D to 1,198 on chromosome 7A (Table 2). There were fewer SNPs located on homoeologous group-4 chromosomes than on other chromosomes in each genome ( Table  2). The average PIC values for each chromosome ranged from 0.220 (chromosome 4D) to 0.252 (chromosome 3A) ( Table 2).
A data set of 60,077 PAVs was obtained by DArTseq genotyping and the data call rates for every PAV were more than 0.8. Overall, the mode of frequency of missing data was 0.000-0.025. For the genomically tagged PAVs, the mode of frequency of missing data was 0.125- total of 4,524 gene models that contained at least one SNP or PAV (Table 3).
Population structure analysis of the core collection To investigate the population structure of the hexaploid wheat core collection, ADMIXTURE software was used for all the marker data. The optimal cluster number based on the cross-validation value is seven ( Supplementary Fig. S2). Here, we name the seven clusters Pop-A to -G in the order of largest to smallest Q values (the ancestry fractions, Fig. 1 and Table 1).
Pop-A consisted of 28 Eastern Asian T. aestivum (27 Japanese and one Chinese) accessions and one T. compactum accession (KU-1208) ( Table 1). The 25 Pop-B accessions originated from southwest China, Himalayan regions, and Southern Asian countries. The Chinese Spring standard cultivar was classified into this cluster. Notably, all T. sphaerococcum accessions belong to this cluster (Table 1). Pop-C contains 30 accessions including all T. vavilovii , two T. compactum (KU-3063 and KU-7350), one T. spelta (KU-3377) and 25 T. aestivum. They are derived from Western Asian regions (Pakistan, Afghanistan, Uzbekistan, Iran and Turkey) ( Table 1). Pop-D, consisting of 28 T. aestivum and one T. compactum (KU-3242 from Iran) accessions, is character-ized by richness of admixture accessions, with more than half being admixture accessions with Pop-C or Pop-F (Fig.  1). These admixture accessions are derived from Western Asian and European regions, respectively. The accessions with higher ancestry fractions in this cluster (more than 0.8) are from Turkey, Iran and Ethiopia. Pop-E contains 22 accessions, the majority of which are also admixture accessions. We noted that they are modern cultivars from various regions ( Fig. 1 and Table 1). The accessions with higher ancestry fractions in this cluster (more than 0.8) are from Turkey, Syria and Lebanon. All 1). Both T. macha and synthetic wheat accessions were admixtures with Pop-D (Fig. 1).
Principal component analysis of the core collection We used PCA as an alternative method for investigating the population structure of the hexaploid wheat core collection. The first, second and third principal components (PC1 to PC3) explained respectively 8.1, 5.6 and 4.1% of their variation. Accessions plotted on the PC1-PC2 plane formed a triangular shape ( Fig. 2A). Accessions classified as Pop-A and Pop-C clustered separately at two vertices (Pop-A at the bottom and Pop-C at the top right in Fig. 2A) of the triangle. Accessions identified as Pop-B were plotted between Pop-A and Pop-C ( Fig. 2A). Accessions classified as Pop-F and Pop-G clustered together at the third vertex of the triangle ( Fig.  2A). Accessions of Pop-D and Pop-E were plotted along the line between Pop-C and Pop-F ( Fig. 2A).
Accessions identified as Pop-G were divided into two sub-clusters according to PC1 and PC3 (Fig. 2B). These two sub-clusters correspond to a taxonomical discrimination between T. spelta and T. macha. Synthetic hexaploid wheat accession W7984 belongs to the sub-cluster with T. macha. On the other hand, scatter plots on the PC1-PC3 as well as on the PC2-PC3 planes show that Pop-G accessions are distinguishable from other accessions by PC3 (Figs. 2B, 2C).
Phenotype diversity and association mapping of the hexaploid wheat core collection We measured nine quantitative traits, namely heading time (HT), flowering time (FT), maturation time (MT), number of spikelets per spike (SN), spike length (SL), spike density (SD), culm length (CL), 100-grain weight (HGW) and number of tillers (TN); and seven qualitative traits, namely wax-like deposits on culm (WC), ochrea (WO) and spike (WS), anthocyanin pigmentation on coleoptile (AC) and seedling (AS), glume hair (GH), and awnedness (AWN) (Tables 4 and 5). We performed marker-trait association analyses for these traits using both SNP and PAV data sets. We detected a total of 183 significant marker-trait associations (MTAs) for 127 SNPs and 140 for 96 PAVs at a significance level of 1% after Bonferroni multiple test correction. Fourteen SNP and eight PAV tag sequences found were within gene model sequences (Supplementary  Table S1).
The 68 MTAs (38 SNPs and 30 PAVs) with HT were distributed on 20 chromosomes except for 4D ( Fig. 3A and Table 6). Five SNPs and five PAVs are within gene model sequences (Supplementary Table S1). We detected 66 significant MTAs (37 SNPs and 29 PAVs) with FT on all chromosomes except 4D and 6A (Supplementary Fig. S3A and Table 6). Four SNPs and three PAVs reside in gene model sequences (Supplementary  Table S1). We detected 50 MTAs (32 SNPs and 18 PAVs) with MT on 17 chromosomes ( Supplementary Fig. S3B and Table 6). Six SNPs and one PAV were within gene model sequences (Supplementary Table S1). MTA analyses with the three flowering-related traits (HT, FT and MT) resulted in detection of a similar set of markers: 26 markers (17 SNPs and nine PAVs) were commonly associated with all three flowering-related traits, 29 markers (13 SNPs and 16 PAVs) with the HT and FT traits, one SNP with the HT and MT traits, and three markers (two SNPs and one PAV) with the FT and MT traits (Supplementary Table S1). As expected, the phenotypic scores of the three flowering-related traits are highly correlated with each other (Supplementary Fig. S4).
A total of 47 MTAs with CL were found on all the chromosomes expect 3D, 4D, 5D, 6D, 7A and 7D ( Fig.  3B and Table 6). Three markers (two SNPs and one PAV) were within gene model sequences (Supplementary HT: heading time; FT: flowering time; MT: maturation time; CL: culm length; TN: number of tillers; SL: spike length; SN: number of spikelets per spike; HGW: 100-grain weight; SD: spike density; WC, WO and WS: wax-like deposits on culm, ochrea and spike, respectively; AC and AS: anthocyanin pigmentation on coleoptile and seedling, respectively; GH: glume hair; AWN: awnedness. Number of (SNPs) | (PAVs) detected with significant MTAs (a significance level of 1% after Bonferroni multiple test correction). * A pseudomolecule composed of unanchored scaffolds (IWGSC RefSeq ver. 1.0). Table S1). Notably, three markers significantly associated with CL were also associated with HT and FT, and two with HT (Supplementary Table S1). We detected MTAs with SN with sixteen markers (six SNPs and 10 PAVs) on chromosomes 2D, 3B, 4A, 4B, 5A, 5B, 7A and 7B ( Supplementary Fig. S3E and Table 6). Four markers (one SNP and three PAVs) were commonly associated with SN as well as HT, FT and MT (Supplementary Table  S1). Moreover, one PAV marker associated with SN on chromosome 4A was also detected in the MTA analysis of HT and FT (Supplementary Table S1). The results of other MTAs for the traits TN, SL, SD, AWN, AC, AS, WO, WC, WS and GH are depicted in Supplementary Figs. S3 and S5, and summarized in Table   6. We found significant MTAs for the traits TN, SL, SD, AWN, AC, AS, WO and GH, but not for WC or WS. The number of significant MTAs for these traits varied from zero (WC and WS) to 26 (AS, Supplementary Table S1).

DISCUSSION
Diversity of the NBRP-Wheat hexaploid wheat core collection as revealed by DArT markers The 187 accessions collected from various geographical regions were genotyped by the DArTseq marker system (34,234 SNPs and 60,077 PAVs). For analyses, we selected 13,394 SNPs and 20,523 PAVs as the data sets that could be anchored to the IWGSC RefSeq ver. 1.0 by BLAST searches. The number of markers was highest in the B genome (5,820 SNPs and 7,484 PAVs) and lowest in the D genome (1,768 SNPs and 5,899 PAVs) ( Table 2). These differences in the number of markers among genomes are coincident with those observed in previous studies (Akbari et al., 2006;Semagn et al., 2006;Zhang et al., 2011;Nielsen et al., 2014). The mean PIC value was also lowest in the D genome (SNPs: 0.230; PAVs: 0.159) ( Table  2). These results show that the diversity is lower in the D genome than in the A and B genomes. The fewer polymorphisms present in the D genome may reflect the genetic bottleneck that occurred through hexaploidization and that brought only a limited amount of genetic diversity to Ae. tauschii (Caldwell et al., 2004;Chao et al., 2009;Mizuno et al., 2010).
Population structure and genetic relationships among common wheat Bayesian clustering analysis divided common wheat accessions into six clusters (Pop-A, -B, -C, -D, -E and -F) ( Table 1) and most of them were admixtures (Fig. 1). However, there are some non-admixture accessions that help to infer the characteristics of each cluster (Fig. 1). Accessions classified as Pop-A belong to Eastern Asian lines present at the eastern limit of wheat distribution (mostly Japanese cultivars; Table 1). Landraces derived from the Himalayan region to Southern Asia are categorized into Pop-B (Table  1). Accessions included in Pop-C are mostly Western Asian lines (Afghanistan and Iran; Table 1). Landraces derived from the Middle East are categorized into Pop-D and Pop-E (Iran, Turkey, Lebanon and Syria) ( Table  1). Most European landraces are classified as Pop-F (Table 1). Overall, the common wheat landraces of the NBRP-Wheat hexaploid wheat core collection are genetically divergent and form at least six groups each consisting of accessions from close geographical regions.
The accessions analyzed in this study form a triangular shape on the PC1-PC2 plane (Fig. 2). Accessions classified as Pop-A and Pop-C form two vertices of the triangle, and accessions of Pop-B are plotted on the side of the triangle between Pop-A and Pop-C (Fig. 2). Accessions classified as Pop-F form the third vertex of the triangle, and accessions of Pop-D and Pop-E were plotted on the side of the triangle between Pop-C and Pop-F (Fig. 2). The configurations indicate that the Himalayan region group (Pop-B) is situated between Eastern Asian (Pop-A) and Western Asian (Pop-C) groups, which may represent the dispersal pathway of hexaploid wheat through the Silk Road. Middle Eastern groups (Pop-D and Pop-E) are situated between Western Asian and European groups (Pop-F, Fig. 2), likely indicating that the latter two groups originated from the Middle Eastern groups. The finding of accessions that connect the vertices of the triangle ( Fig.  2A) suggests that the differentiation of the six groups among common wheat landraces was genetically and geo-graphically not completely punctuated.
Most modern cultivars are categorized into Pop-A, Pop-E and Pop-F (Table 1). Non-admixture accessions in Pop-A are Japanese cultivars and most are soft wheat cultivars used for Japanese noodles. Non-admixture accessions in Pop-F are early British cultivars developed before the 1920s. These results suggest that they developed from regional landraces. Many modern cultivars categorized into Pop-E are admixtures (Fig. 1) and most are cultivars developed after the 1930s.
Genetic relationships and origin of exotic hexaploid wheat species The NBRP-Wheat hexaploid wheat core collection contains five exotic non-aestivum hexaploid wheat species (T. spelta, T. compactum, T. sphaerococcum, T. macha and T. vavilovii). Population structure analysis showed that these accessions were divided into seven clusters (Fig. 1). Six of them were composed mainly of common wheat accessions and only one cluster (Pop-G) contained T. spelta, T. macha and synthetic hexaploid wheat (Fig. 1). Accessions categorized as Pop-G were sharply distinguished from other accessions by PCA (Fig. 2). These results indicate that accessions included in Pop-G are genetically different to common wheat accessions. Spelt wheat (T. spelta) accessions used in this study were derived from Germany, Spain and Iran (Table 1). All European spelt wheat accessions were categorized as Pop-G, whereas Iranian spelt wheat was categorized as Pop-C (Fig. 1), in agreement with a previous study suggesting that European and Asian spelt wheats originated independently (Dvorak et al., 2012). Macha wheat (T. macha) is endemic to the Caucasus (Cao et al., 1998;Zohary and Hopf, 2000;Matsuoka, 2011). Previous work has suggested that this hexaploid wheat originated from a hybrid cross between wild emmer wheat (T. turgidum ssp. dicoccoides) and T. aestivum (Matsuoka, 2011). All macha wheat accessions are admixtures with genetic exchange from the Pop-D accessions including Caucasian common wheat; in contrast, all European spelt wheats are non-admixture accessions (Table 1 and Fig. 1). This result suggests that T. aestivum lines genetically close to Pop-D accessions contributed to the origin of macha wheat. Comparative genetic studies indicated that European spelt wheat originated from the introgression of cultivated emmer wheat (tetraploid, 2n = 4 × = 28) into free-threshing hexaploid wheat (Matsuoka, 2011). Furthermore, the synthetic hexaploid wheat accession, W7984, was classified into Pop-G and its genomic composition was very similar to macha wheat accessions (Fig. 1). Moreover, PCA showed that W7984 clustered with macha wheat accessions (Fig.  2). These results indicate that the accessions in Pop-G have emmer wheat genetic influences which distinguish this group from others.
Triticum sphaerococcum is endemic to southern Pakistan and northwestern India (Percival, 1921;Ellerton, 1939;Mori et al., 2013). Together with Pakistani common wheat, all T. sphaerococcum accessions were classified as Pop-B according to population structure analysis (Fig. 1). This result suggests a single origin for T. sphaerococcum in Southern Asia. Triticum vavilovii is endemic to the southern Caucasus. All T. vavilovii accessions were classified into Pop-C, together with T. aestivum accessions. They are admixture accessions, being genetically related to Pop-D; in addition, their genomic structure was similar to common wheat accessions from Afghanistan (Fig. 1). This result suggests that T. vavilovii is genetically close to common wheat derived from Western Asia: whereas common wheat is free-threshing, T. vavilovii is non-free-threshing. The Iranian T. spelta accession in Pop-C is non-free-threshing ( Fig. 1), which may indicate that Asian spelt wheat contributed to the origin of T. vavilovii.
Club wheat (T. compactum) differs from other hexaploid varieties principally due to its compact spike, which results from the activity of the dominant compactum (C) allele on chromosome 2D (Rao, 1972;Johnson et al., 2008). Six club wheat accessions derived from various regions (Japan, China, Afghanistan, Iran, Turkey and Ethiopia) were selected from the NBRP-Wheat hexaploid wheat core collection. Population structure analyses classified five of the accessions, with the exception of the Ethiopian club wheat accession, into four clusters (Pop-A, -C, -D and -F, Fig. 1). This result suggests that club wheat accessions lacked genetic homogeneity as a taxonomic group, in contrast to other exotic hexaploid wheat. Kosuge et al. (2012) showed that genes other than C affect compact spike size in hexaploid wheat; therefore, we propose that the club wheat accessions originated independently in various regions through mutations in different loci.

Association analyses
We performed association analyses for nine quantitative traits and seven qualitative traits and found a total of 322 significant marker-trait associations (MTAs , Table 6). Here, we take MTAs for heading time (HT), culm length (CL), awnedness (AWN) and anthocyanin pigmentation on coleoptile (AC) as examples to compare our results to previous studies and test the potential usefulness of the NBRP-Wheat hexaploid wheat core collection for association mapping studies.
Eight QTLs for ear emergence are located on chromosomes 2B, 2D, 4A, 5A, 5D and 7D (for a summary of the wheat gene catalogue, see Gene Catalogue 2013, https://shigen.nig.ac.jp/wheat/komugi/genes/download. jsp). Meta-QTL analysis in elite European winter wheat germplasms found QTLs for ear emergence on chromosomes 1B, 1D, 2A, 3A, 3B, 4B, 4D, 5A, 6A, 6B, 7A, 7B and 7D (Griffiths et al., 2009). Neumann et al. (2011) detected MTAs for HT on chromosomes 1B, 1D, 2D, 4A, 4B, 5D, 6A, 6B and 7A by genome-wide association analysis of 96 winter wheats with 874 DArT markers. Edae et al. (2014) reported MTAs for HT on chromosomes 1D, 2A, 2B, 3A, 3B, 4B and 7D by genome-wide association analyses in 285-294 spring wheats grown in five different environments with 1,863 DArT markers. Genome-wide association analysis of a core collection of 227 bread wheats with 760 DArT markers revealed 62 markers corresponding to 33 chromosomal regions individually associated with earliness components (Le Gouis et al., 2012). These studies suggest that the HT trait is strongly related to the photoperiod response gene (Ppd-1) on the short arms of homoeologous group-2 chromosomes and the vernalization response gene (Vrn-1) on the long arms of homoeologous group-5 chromosomes. We found 68 MTAs (30 SNPs and 28 PAVs) for HT on 20 chromosomes, except for 4D ( Fig.  3A and Table 6). The significant markers were located on the short arms of chromosomes 2B and 2D and on the long arms of chromosomes 5A and 5D (Supplementary  Table S1), which may indicate the detection of the Ppd-1 and Vrn-1 homoeoloci in the association study. The significant markers on chromosome 2A were all located on the long arms (Supplementary Table S1), indicating that HT was not associated with the Ppd-1A locus on chromosome arm 2AL in our conditions. Association of HT with markers on 2AL was also reported by Edae et al. (2014).
Awning of wheat is controlled by three genes, Hd, B1 and B2 (Gene Catalogue 2013), which are located on the short arm of chromosome 4A (Sears, 1954;Rao, 1981;Sourdille et al., 2002), the long arm of chromosome 5A (Sears, 1954;Sourdille et al., 2002) and the long arm of chromosome 6B (Sears, 1954;Sourdille et al., 2002), respectively. Wheat varieties homozygous recessive for hd, b1 and b2 alleles are fully awned, while those homozygous dominant for either Hd B2 or B1 B2 are awnless; those with Hd B2 are expected to be awnless (Sourdille et al., 2002). We detected significant MTAs on chromosomes 2A, 4D and 5A ( Fig. 3C and Table 6), and MTAs on chromosome 5A likely correspond to B1. However, we could not find MTAs corresponding to Hd or B2. QTL analysis for three traits related to awning (awn length at the base, middle and top of spike) in a doubled-haploid population derived from the cross between T. aestivum 'Courtot' (awned) and T. aestivum 'Chinese Spring' (awnless) revealed two QTLs on chromosomes 4A and 6B that cosegregated with Hd and B2, respectively. The QTL on chromosome 4A (Hd) had a stronger effect on the awns at both ends of the spike and that on chromosome 6B (B2) on the awns in the middle of the spike (Sourdille et al., 2002). Recently, Yoshioka et al. (2017) reported finding a variety of awned and awnless spikes in a segregating population between T. aestivum 'Mironovskaya 808' (awnless) and T. aestivum 'Chinese Spring' (awnless). Additionally, they found that all phenotypic variation could be explained by the three loci Hd, B1 and B2 by a combination of close observation and genotype prediction. In this study, however, we rated awnedness as a qualitative trait with a three-point scale, which would have resulted in poor phenotypic resolution in detecting significant MTAs in other predicted genomic regions.
Anthocyanin pigments in wheat are found in the anthers (gene symbol Pan), auricle (Ra), leaf blade (Plb), leaf sheath (Pls), coleoptile (Rc), culm (Pc) or pericarp (Pp) (Gene Catalogue 2013). The Pan, Plb, Pls, Rc and Pc genes were closely linked on the short arms of homoeologous group-7 (Khlestkina et al., 2002(Khlestkina et al., , 2009). Ahmed et al. (2006) noted that the Rc gene likely encodes a transcriptional activator of late biosynthesis genes involved in light-regulated anthocyanin synthesis. We detected significant MTAs with AC on the short arm of chromosome 7A, as expected for the Rc gene ( Fig. 3D and Table 6).
The measurements for each trait in this study were obtained from a single cultivation season at one location. Therefore, we are not taking account of phenotypic variance due to environmental effects. However, as discussed above, some MTAs detected in this study are consistent with genes, QTLs and MTAs that have been reported previously. The genotype information acquired from the NBRP-Wheat hexaploid wheat core collection in this study provides suitable research tools to identify novel QTLs by marker-trait association analyses. The collection has previously been used in forward genetics studies (Yokota et al., 2016;Yoshioka et al., in press). The public release of the core collection with intensive genotype information should promote gene dis-covery in hexaploid wheat.