South Asian Genetic Plot

South Asia PCA

Above is a PCA plot constructed with Harappa Gedmatch data for various South Asian ethnic groups. Below are some observations.

  • Bangladeshis form their own interesting cluster, somewhat removed from the South-Asian cline most of these samples fall on. This is due to significant East-Asian admixture, likely from a population somewhere between Tibetans and Burmese. Though not pictured, there are Bangladeshi samples from areas like Chittagong that have so much East-Asian admixture that they are pulled entirely away from the main South-Asian grouping.
  • The difference between Tamils and Malayalis is overstated by this plot, as most of the samples from the former come from low-mid castes, while most of the later is comprised of mid-high castes.
  • Mid-caste Gujaratis cluster near high-caste Malayalis. Kshatriyas and certain commercial castes like the Ghanchi (of which Narendra Modi is a member) bridge the gap from the South to North-West, while the Brahmans cluster right with the main Punjabi group.
  • Also interesting to note that Gujarati upper-castes are the most Western-shifted of all North-Indian groups, likely due to their comparatively high Iranian Hunter-Gatherer (HG) admixture from the IVC days. As an example, the main UP Brahman cluster is not with Gujarati Brahman group, but near the Gujarati Kshatriya/Commercial group mentioned above.
  • The Pakistani Punjabi group is a combination of Jatt and Arain individuals, and quite representative of where the bulk (about 2/3rds) of the Muslim Punjabi population clusters. The most Western-shifted samples are near Pashtuns, while the Southern-shifted samples are adjacent to where Punjabi Chamars are found.
  • The Pakistani Pashtun samples are from Kurram and Swat, and look pretty much as we would expect, connecting Punjabis with Afghan Pashtuns. I think its likely those Pak-Pashtun samples clustering with Punjabis are from the KPK lowlands adjacent to the Indus, while those closer to certain Afghan Pashtuns are more representative of the highlands and tribal areas.
  • Afghan Pashtun samples were screened out if they had too much East-Asian admixture pulling them away from the plot, but we can still observe the affect of the Turkic migrations on the region, as some of the samples are being shifted a bit out and to the left (towards Uzbeks and to a lesser extent Mongols).
  • The Balochis seem the most distinct out of these populations genetically, though you do get some overlap when plotting Sindhis with them. They are generally characterized by very high Iranian HG, without the later Aryan admixture seen in Pashtuns and Punjabis. They do seem to have significant levels of Arab ancestry, and a few samples even had to be screened out due to being quite high in African admixture. I might make a plot later to see how they differ from Brahui and Makrani samples.

That’s all for now. My next post will probably be history/cartography related rather than genetics, though I have plenty of both in the pipeline. Questions and comments welcome as always.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store