South Asia Ancestry Map

3 min readJul 6, 2022



There aren’t any good ancestry maps for the Indian Subcontinent available online, so I decided to make one. Unfortunately, the high levels of endogamy across the region have resulted in various tribes and castes that are genetically distinct from each other, which makes fitting every representative population onto a single graphic quite difficult. Notes on how I constructed the map are below:

  • Inclusion preference favored those tribes that traditionally comprise the large mass of landowners and peasant farmers for each respective region. Traditional varna castes (Brahmin, Kshatriya, Dalit) are also represented. Locality and demographics were tabulated from British Indian Census results and Joshua project.
  • Locations of tribes/castes are generally accurate on the map, though a couple exceptions had to be made for space considerations. Specifically, putting the representative Gangetic Brahmin and Dalit samples in a generic location somewhere in the Gangetic-North. The same was done with the Dravidian Brahmin and Dalit samples visa-vis the South.
  • Samples were mostly from the online database at Genoplot, utilizing G25. A few were drawn from Harappa averages I have collated from Anthrogenica, and using Genoplot converted into G25 format. I’ve tested original G25 coordinates against Harappa-converted coordinates, and they are nearly identical, hence reliable for this exercise.
  • Each population was run 3 times, and the median values across all 3 runs for each reference population was used. Results of .5% admixture were generally rounded down to 0% due to concerns of random error/noise in the model.

Notes on reference populations:

“Ancestral Indian” represents the original inhabitants of India, who preceded both the Indus Valley Civilization and the Aryan Invasion. It was derived by subtracting East Asian ancestry from an Andamese Islander reference.

“Ancient Iranian” represents a hunter-gatherer population from Iran that arrived in India several thousands of years ago. They mixed with the Ancestral Indians, with the resulting progeny going on to found the Harappan Civilization in the Indus Valley. It was derived from an Iranian Hotu reference.

“Middle Eastern” represents genetic input from West Asian populations into the Indian Subcontinent that occurred long after the earlier migration of Iranian hunter-gatherers into the region; likely after the collapse of the Indus Valley Civilization. Its derived from a Hajji Firuz C reference.

“Steppe Aryan” represents the ancient Indo-Europeans who migrated from Eastern-Europe into Central Asia, a branch of which would go on to conquer much of India, spreading their Sanskrit-Vedic culture as they went. Their reference population are the Sintashta MLBA.

“East Asian” represents ancestry from East Asia, which mostly arrived in South Asia via Tibetans, Turks, Burmese, and Austronesians. This was the only admixture signal derived from calculations rather than a single population. I basically used the East-Asian related scores for each population in Harappa as a reference, and cross-checked them against separate Japanese, Yakut, and ancient Vietnamese runs, to arrive at a combined East-Asian value.




South Asian history, genetics, and culture.