2025 Volume 75 Issue 1 Pages 61-66
Individuals across a species exhibit substantial presence-absence variation, to the extent that a reference genome from a single individual only contains a subset of the species’ genome. Cataloguing genome regions absent from a reference genome can therefore reveal novel genome regions, and some of this variation can be adaptive. In this work, existing short sequencing reads for the underutilised crop lablab (Lablab purpureus (L.) Sweet) were used to identify regions of the genome absent from the reference genome. Lablab is made up of two distinct gene pools, each with wild and domesticated types therefore represents an opportunity to identify gene pool-specific variation. Approximately 7.7% of the reads from eight accessions failed to map to the lablab reference genome (cv. Highworth), putatively being novel, and these were assembled and collapsed into between 735 and 12,304 contigs. Four samples were focussed on (one each wild and domesticated from each of the gene pools) and the novel contigs compared, to identify those present only in subsets of samples. Whilst the number of contigs containing sequenced with similarity to known genes in other legumes was low, there were some enriched gene ontology (GO) terms that could relate to adaptive differences between the groups and therefore contain novel genes for future lablab breeding. The approached used here has potential use in any other species.