Conventional wisdom has long held that the principal genetic divisions in South Asia are generally tied to linguistic differences. Indo-Aryan speakers (North Indians) form one broad cluster, Dravidian speakers (South Indians) form a second cluster, Iranic speakers (Western Pakistan) form a third, etc. While allowances are made for caste-differences and geography, the above formulation is more-or-less accepted by the casual observer. Its often wrong however, particularly at the margins.
One such case I want to highlight is the genetic gap between two neighboring Indo-Aryan regions; the Indus Valley (specifically Punjab and Sindh), and Gangetic North India (Uttar Pradesh, Rajasthan, Gujarat). Below is a PCA chart constructed from Harappa ancestry samples, which will help us visualize the genetic distance between these ethnic groups.
As can be seen from the figure above, there is little genetic overlap between the Indus Valley and Gangetic North India. In contrast, despite both a linguistic and geographic divide, Gangetic North India does exhibit significant overlap with Dravidian South India. Indus Punjabis show a similar relationship with Iranic Pathans, however Indus Sindhis and Iranic Balochis do not seem to overlap.
Hopefully this will help caution readers against lumping all Indo-Aryan ethnic groups together. These populations not only have significant cultural differences, but from a genetic standpoint they can often diverge considerably.
- The Dravidian sample includes individuals from Kerala, Tamil Nadu, and Andhra Pradesh. The Gangetic sample includes individuals from Uttar Pradesh, Rajasthan, and Gujarat. The Punjabi sample includes both Pakistani and Indian Punjabis. The Pathan sample includes individuals from the Eastern Pashtun grouping.
- Its possible that many of these groups will demonstrate greater or lesser genetic overlap once more South Asian genetic samples become available. This post is not the last word on the topic, simply an observation.
- The phenomenon of linguistically similar Indian groups diverging significantly from a genetic standpoint (even after adjusting for caste) is also seen with Malayalis visa-vis Dravidians, and Marathis visa-vis Indo-Aryans. Time permitting, I hope to write about both of these cases.
- Data source comes from Harappa Ancestry Project, with scores from individuals being largely collated from forums like Anthrogenica. This info is publicly available, and analysis can be reproduced by anyone willing to collate the scores and run them through a PCA program. I used BioVinci, but there are free programs available as well.