2012 Volume 120 Issue 3 Pages 227-234
The present study determines the inter-and intra-population affinities or variations among the diverse population groups of India. The major goal of the present study was to understand the peopling of India and its role in the peopling of Southeast Asia using 11 restriction fragment length polymorphism (RFLP) mitochondrial DNA (mtDNA) markers. A total of 950 unrelated individuals belonging to 19 population groups having varied ethnic, linguistic, and geographic backgrounds were chosen for the present study. All the studied sites, except HpaI 3592, are found to be polymorphic in the data set. High frequencies of M haplogroup are found among the South Indian populations, whereas N haplogroup is found to be high among the North Indian populations. Sub-haplogroups C and D of M are found only in the Tibeto-Burman-speaking Northeast population groups, suggesting their probable migration from Central Asia. Sub-haplogroups A and B of N are shared by both Northeast and North Indian population groups. The sub-haplogroups of M and N are absent among the South Indian and East Indian populations except the Thoti of the South India Dravidian tribe. The Northeast Indian populations exhibit the highest haplotypic diversity, whereas the South and East Indian populations have the lowest haplotypic diversity. The study provides evidence for a common maternal genetic substratum of Indian populations with probable differential admixture from Eurasia, i.e Europe and Asia, with a decreasing trend from North to South India. Sub-haplogroups of M and N and 9 bp deletion frequency patterns suggest gene flow from East Asia to India was restricted only to Northeast India and not suggest significant movement of people from India to East Asia through Northeast India.
India, the major land bridge between Africa and Southeast Asia, plays a key role in studies of human evolution and migration. Of the various theories proposed regarding human migration, the southern coastal route has gained increasing acceptance. The major findings of the HUGO Pan Asia SNP Consortium (2009) where that peopling of Southeast Asia occurred via the Northeastern part of India. However, Cordaux et al. (2004) and Saraswathy et al. (2009) opined that Northeast India was a barrier rather than a corridor. This issue is still debated, and the role of India in the peopling of East Asia remains controversial. Further, settlements in various parts of the world had never been continuous and systematic. This is evidenced through geography-specific distribution of pre-human forms, e.g. Australopithecus in Africa (Dart, 1925; White et al., 1994). Moreover, geographic and climatic catastrophies, such as the glaciations in Eurasia, led to major migrations towards the southern part of the globe. Prehistoric or historic human movements along with the present globalization have lead to homogenization of population groups, thus making human migratory history obscure and difficult to study. Anthropologists have no other option except to look at genetic signatures among the extant population groups, by means of sophisticated technologies. However, such huge investments simply to understand migratory histories may not be feasible for developing countries such as India. Thus an attempt is made in the present study to understand the peopling of India and also its role in the peopling of Southeast Asia using restriction fragment length polymorphism (RFLP) mitochondrial DNA (mtDNA) markers. The 19 populations selected for the present study are from various geographical regions of India with unique cultural, ethnic, and linguistic affiliations.
The details of the studied population groups are given in Table 1. Genomic DNA was extracted from 5 ml of intravenous blood samples collected from 950 unrelated individuals, with prior informed written consent, using the salting-out technique (Miller et al., 1988). Ten mtDNA restriction site polymorphisms (HaeIII np 663, HpaI np 3592, AluI np 5176, AluI np 7025, DdeI np 10394, AluI np 10397, MnlI np 10871, HincII np 13259, AluI np 13262, HaeIII np 16517) and one insertion/deletion polymorphism (IDP; COII/tRNALys intergenic 9 bp deletion) were screened through polymerase chain reaction using standard primers and protocols (Torroni et al., 1993, 1996; Santos et al., 2004). Of the total collected samples, mitochondrial haplogroups could be assigned to 780 samples. The geographical location of the populations studied is shown in Figure 1.
Map showing the geographical distribution of the populations studied in this work.
Allele and haplotype frequencies were calculated. Haplotype diversities were estimated as described by Nei (1987) using the formula: haplotype diversity (h) = (n/n − 1) (1 − ∑pi2) where pi is the sample frequency of ith haplotype and n is the sample size. Haplogroups were identified on the basis of analyzed sites (MITOMAP: www.mitomap.org). AMOVA and pairwise Fst genetic distances were calculated using ARLEQUIN 3.0 software (Excoffier et al., 2005).
All the studied sites, except HpaI 3592, are polymorphic in the data set. Only three sites (DdeI 10394, AluI 10397, and MnlI 10871) are found to be polymorphic in all the populations (Table 2). These three sites are important for the identification of the major mitochondrial haplogroups M and N. The 9 bp deletion is found in Northeast Indian populations, except for three tribal groups—Paite, Thadou, and Koms; and also in three North Indian populations (especially from Himachal Pradesh)—Brahmins, Rajputs, and Jats. The frequency of the 9 bp deletion exhibits a wide range of variation in the populations studied here, ranging from 0.023 among the Meiteis of Northeast India, and Brahmins and Jats of North India to 0.06 among the Aimols.
The populations studied here also show variation in the distribution of major haplogroups—M and N, and their sub-haplogroups (Table 3). Haplogroup M seems to decrease as one moves up from South India to North India and the reverse is the case for the N haplogroup. The Dravidian speakers of East and South India possess high frequencies of M haplogroup, ranging from 76.32% among the Nayakpod tribe (South India) to 81.58% among the Oraon tribe of East India. The Munda tribe, which speaks the Mundari group of Austro-Asiatic linguistic family, also exhibits a high percentage of M haplogroup, i.e 82.85%. Five North Indian caste populations have M frequencies ranging from 31.25% in Sindhi to 68.18% in Jats. The Northeastern tribal and non-tribal populations are intermediate in the distribution M of haplogroup, ranging from 24.33% in Koms to 74.42% in Meiteis. Sub-haplogroups C and D of M haplogroup are found only in the Tibeto-Burman-speaking Northeast population groups. Sub-haplogroups A and B of N are shared by both Northeast and North Indian population groups, whereas these are absent among the Austro-Asiatic-and Dravidian-speaking populations of India.
While analyzing the haplotypic structure in each haplogroup, it is found that the M haplogroup has five haplotypes (Table 4); of these, the 00111101010 haplotype is shared by all the studied populations with frequencies ranging from 24.32% among Koms to 79.55% among Thotis. As compared to the M haplogroup, which has five haplotypes, the N haplogroup has nine haplotypes, whereas the 00110011010 haplotype is shared by all these studied populations with frequencies ranging from 4.65% among Brahmins of Andhra Pradesh to 38.78% among the Aimol tribe. Population-specific haplotypic diversity reveals that the South Indian and East Indian caste and tribal populations have the least haplotypic diversity, i.e. below 50%. The Northeast Indian tribal and non-tribal populations exhibit the highest haplotypic diversity, reaching as high as 85% among the Rongmei tribe. The North Indian populations have similar diversity values to those of Northeastern populations, except Jats, who have 50% haplotypic diversity.
Haplotypes are made in order to the position of restriction sites, i.e. mt663, mt3592, mt5176, mt7025, mt10394, mt10397, mt10871, mt13259, mt13262, mt16517 and 9 bp indel. ‘1’ and ‘0’ in the haplotypes signify the presence and absence of restriction sites and deletion in the case of RFLP and indel markers, respectively.
The absence of the HpaI 3592 restriction site in the populations studied here reveals that all the mitochondrial genomes investigated belong to the L3 lineage of mtDNA, which is further branched out into M, N and R haplogroups. Sub-grouping of M, N and R occurred due to mutation and drift, and spread along the Asian and European countries, and at the same time L3 was lost (Macaulay et al., 2005). The highly polymorphic nature of the DdeI 10394, AluI 10397 and MnlI 10871 in the present study indicates that all the studied populations belong either to M or N haplogroups as these sites are used as diagnostic restriction sites for these haplogroups (MITOMAP: www.mitomap.org). Haplogroup M was supposed to have originated in Eastern Africa ~60000 years ago and migrated towards Asia (Quintana-Murci et al., 1999), and hence is regarded as an Asia-specific haplogroup.
The presence of fewer M haplotypes is indicative of lower heterogeneity among the studied populations. Furthermore, sharing of these M haplogroups by all the populations studied here is suggestive of a common maternal genetic contribution in the Indian population. The higher frequencies of the M haplogroup among the South Indian (Dravidian speakers) and East Indian (Proto-Australian: Dravidian and Austro-Asiatic speakers) tribes support the hypothesis that these populations could be the earliest settlers of India (Guha, 1935; Keith, 1936; Thapar, 1966; Pattanayak, 1998; Basu et al., 2003).
Regarding the sub-haplogroups of M, almost all (Meitei, Thadou, Rongmei, Aimol, Manipur Muslim, and Manipur Bamon) populations of Northeast India with East Asian ethnic background exhibit C, ranging from 2.33% among Meitei to 5% among Rongmei, and D, ranging from 7.89% among Manipur Bamon to 20% among Rongmei. If the southern coastal route of human migration is to stand, this is not expected as the sub-haplogroups are absent among the Dravidian- and Austro-Asiatic-speaking tribals and also among the North Indian caste populations except for the presence of C and D sub-haplogroups among Himachal Brahmin and D sub-haplogroup among Thoti of South Indian Dravidian tribe with relatively lower frequencies. Thus, one cannot presume that these sub-haplogroups of M are of Indian origin. In other words, one can say that the presence of these sub-haplogroups in Northeast India could mainly be due to gene flow from Southeast Asian countries where their frequencies are reported to be higher (Ballinger et al., 1992; Kolman et al., 1996; Maca-Meyer et al., 2001; Wen et al., 2004). Further, gene flow into Northeast India seems to be mainly from Central Asia and Siberia as most of the populations of these regions have high frequencies of C and D (Kolman et al., 1996). This further points to the fact that gene flow was confined to the Northeastern Indian populations only and there was no further advance towards southern or northern India. This is also supported by the studies done by Cordaux et al. (2003, 2004) and Saraswathy et al. (2009), who suggested Northeast India to be a major barrier rather than a corridor as was proposed by Reddy et al. (2007).
Like the M sub-haplogroups, only the Northeast Indian populations document N sub-haplogroups, suggesting gene flow into Northeast India from Eastern Eurasia leading to heterogeneity of Northeast Indian populations as was also reported by Saraswathy et al. (2009). This contradicts the mythological history, in which it is believed that peopling of Manipur was done from Southeast Asia (Shakespear, 1912). However, the present findings are in agreement with the proposal of the peopling of Manipur through China (Pemberton, 1835). The North to South decrease of N haplogroup frequencies in the populations studied herein, with the lowest frequency being documented among Dravidian and Austro-Asiatic populations, reflects the extent of admixture, which is decreasing from North India to South India as was also reported by previous studies (Reich et al., 2009; Saraswathy et al., 2010). B haplogroup is found to be absent among the Siberian populations (Kolman et al., 1996). Therefore, Central Asia seems to be the most probable region which could have contributed to the maternal gene pool of Northeast Indian populations and the presence of D among Thotis (South Indian Dravidian population) seems inexplicable. The genetic contribution of Southeast Asia to Northeast Indian populations does not seem to be feasible because of either the zero or low frequency of A and B haplogroups amongst the Southeast Asian populations (Kolman et al., 1996).
There are different opinions regarding the origin of the 9 bp deletion (Wrischnik et al., 1987; Passarino et al., 1993; Merriwether et al., 1994; Graven et al., 1995; Soodyall et al., 1996; Alves-Silva et al., 1999) and it is assumed that it might have multiple origins (Thangaraj et al., 2008). Despite the possible multiple origins of the 9 bp deletion, the deletion motif remains a useful marker for tracing population affinities and migration patterns as this deletion motif varies from one geographical region to another (Yao et al., 2000). The East Asian, Southeast Asian, and Polynesian 9 bp deletion have been claimed to arise in China at about ~60000 BP (Redd et al., 1995; Yao et al., 2000). The presence of the 9 bp deletion among three North Indian and five Northeast Indian populations could be of Chinese origin as these populations have high frequencies of East Asian specific haplogroup. The absence of this deletion among South and East Indian populations in the present study supports the genetic discontinuity of these Proto-Australian populations with Northeast Indian populations.
Further, the Tibeto-Burman-speaking tribes and non-tribes of Northeast India are generally believed to have migrated from Tibet and Yunan province of China (Ansari, 1991). It is possible that these East Asian population groups, who entered the Indian mainland through Tibet and Himalayan tracts, might have carried this deletion from central China and left their genetic imprints on the North Indian Himalayan foothills and further moved into the Northeast India. This is also supported by the present study where the 9 bp deletion, though in low frequency, was found only among two populations of North India and five populations of Northeast India. This is in accordance with the previous reports on mtDNA studies where high degree of genetic homogeneity among Himalayan and Northeast Indian Tibeto-Burman groups was observed (Su et al., 1999, 2000; Cordaux et al., 2004).
The highly diverse population of Northeast India is also depicted in the population-specific haploytype diversity graph (Figure 2) where these Northeast populations have the highest haplotype diversity, followed by North Indian populations. Later gene flow from the Eurasian populations at different periods among the North Indian populations might have enhanced the diversity of these populations. On the other hand the lower haplotypic diversity among the South and East Indian populations (<50%) indicates more commonality and homogeneity of these population groups though they belong to different linguistic groups—Dravidian and Austro-Asiatic.
Graph depicting the haplotypic diversity among the populations studied.
Further, AMOVA was performed to understand the molecular variance among various categories of the studied populations (Table 5). Intra-population variation is highly significant in all the studied populations as expected. Inter-population variation among South Indian versus East Indian and Dravidian versus Austro-Asiatic categories are found to be non-significant, as both include population groups of similar type. This indicates that the Austro-Asiatic speakers of East Indian and Dravidian speakers of South India are genetically similar with respect to the studied mtDNA markers, suggesting that they have an autochthonous common genetic background that may be Proto-Australian. This finding is consistent with the views of Thom (2007) and Kumar et al. (2008), who they traced these two linguistically diverse populations to a common archaic hunter-gatherer stock of the Pleistocene era. Furthermore, the rest of the categorized populations show significant differences, indicating diversified gene pools due to differential genomic contributions followed by cultural and geographical isolation. Among-group variation based on geographical distribution show significant differences among three categories, i.e. South Indian versus Northeast Indian, South Indian versus North Indian, and East Indian versus North Indian. Further, South Indian versus East Indian and Northeast Indian versus North Indian populations are found to be genetically close to each other. Variation based on linguistic category, i.e. Dravidian versus Austro-Asiatic groups, is found to be non-significant. Further, variation based on linguistic and geographical groups also show significant differences. The result of AMOVA suggests that language and geography are the two important factors influencing the Indian population structure. Further, castes versus tribes category of the presently studied populations show non-significant differences, suggesting common maternal genetic unity among the populations studied in this work.
Although the present study has the limitation of not having a huge number of markers and does not involve advanced technologies, easy-to-type RFLP markers quite convincingly reflect the migratory histories of the populations of India. M and N haplogroup and their sub-haplogroups distribution patterns, along with the allelic distribution of the selected mtDNA markers among the populations studied here, indicate the common maternal genetic lineages of Indian populations. The study also reveals admixture among Indian populations with a decreasing trend from North to South India and higher heterogeneity among Northeast Indian populations. Moreover, the study also raises doubts about the peopling of Southeast Asia through India, because the present data indicates that gene flow occurred from East Asia to Northeast India and not from mainland India because of the genetic discontinuity observed in this study in populations of Northeast India compared to both the South and the East Indian populations. The present study was restricted only to the major haplogroups which could be typed using less expensive RFLP technology. Further studies on sub-haplogroups, which can be typed only by sequencing, are likely to reveal better insights into the peopling of India.
We are thankful to the University Grant Commission SAP (Special Assistance Program), Department of Biotechnology, and University Grant Commission for providing the financial support for conducting the present study. We acknowledge all the subjects for their help in providing their blood samples. We are also grateful to the Department of Anthropology, University of Delhi for providing the infrastructure to carry out the study.