The Indo-European Question

Origins of the Languages of Half the World

Featured Work ...

Indo-European Origins:  Language, Family Trees, Archaeology and Genetics 

Half the world today speaks native languages that all belong to the same family:  Indo-European.  You are reading in one of these languages now:  English is Indo-European — in other words, it is related, either closely or more distantly, to hundreds more languages from Iceland to Bengal.  Over many thousands of years, these languages have grown far apart from each other, but all had started out as what was once the same single ancestor language, ‘Proto-Indo‑European’.  By today, its panoply of daughter languages has changed and diverged so radically that little obvious is left of their once common origin, although surviving traces can still be heard in the numbers up to ten, for example, and some other basic words.  To listen and compare these words across languages of the four main branches of Indo-European within Europe (i.e. Celtic, Romance, Germanic and Slavic), visit:  soundcomparisons.com/Europe.
Linguistics was born with this realisation that all these languages, including also ancient Greek, Latin, Sanskrit and Avestan, are ultimately “sprung from some common source”.  Over two centuries of research, linguists have been able to ‘reconstruct’ a great deal about the original Proto-Indo‑European language itself.  But trying to pin down where and when it was spoken, deep in prehistory, has relied on inferences that are far less reliable and conclusive, and more subjective.  In the 19th and early 20th centuries, ‘Aryan’ studies became mired in racist and supremacist ideologies, and fanciful, anachronistic visions of prehistory.  The leading serious, realistic modern hypotheses are rooted above all in archaeology, and since 2015 ancient DNA has revolutionised our understanding of population expansions across Europe and parts of Asia, bringing us ever closer to a resolution of the Indo-European origins question.
Indo-European is still essentially a linguistic entity, however, so the direct data lie in the languages themselves.  Can more can be made of them, though, with new computational tools?  Powerful software now exists for modelling the processes by which family tree structures can arise, and even estimating a timeframe for how they diverged.  These tools were devised originally for modelling such ‘evolutionary’ processes in biology, but language families also come about by ‘descent with modification’, so in principle the same general approach should be viable.
Nonetheless, when the latest ‘Bayesian phylogenetic modelling’ was first applied to Indo-European, different studies threw up conflicting results.  On closer inspection (Heggarty 2021), those first results included obvious artefacts, and the cause could be traced back to how the basic language data used had been handled inconsistently, when they were encoded in binary terms as input to the phylogenetic modelling.  A new methodology was needed, to create a completely new language database. One type of language data is particularly viable for these Bayesian phylogenetic analyses, and especially for estimating chronology:  the patterns in how some ancient word roots (‘cognates’) survive or have shifted meaning, across the different branches of the Indo-European family.  Over five years I led a team of over 90 language experts, with my co-editors Dr Cormac Anderson and Dr Matthew Scarborough, to create IE‑CoR, the new Indo-European Cognate Relationships database, published as nature.com/articles/s41597-025-05445-3 and free to explore online at iecor.clld.org.
IE‑CoR was the input data to a new Bayesian phylogenetic analysis of the Indo-European family in Science (Heggarty et al. 2023).  This returned a time estimate that Indo-European first began (spreading and) diverging just over 8000 years ago, and seven major branches had already emerged by c. 7000 years ago.  This chronology, and the branching sequence of the tree, are not compatible with the predominant ‘Steppe hypothesis’ of Indo-European origins, but fit with archaeological and ancient DNA evidence into a more plausible overall scenario.  The ultimate origins of all Indo-European lay south of the Caucasus, in the northern arc of the Fertile Crescent.  The Steppe north of the Black Sea was a plausible later staging post, but only for some of the European branches of the family, and not for Indo-Iranic.  See also Press.
This Science paper fits into my long focus on all aspects of the Indo-European origins question, on research methods both within historical linguistics, and in concert with archaeology and genetics, including:
  • Bayesian phylogenetic analysis, and how to apply it specifically to language.
  • Pitfalls and solutions for encoding comparative language data consistently, and in binary form, as input for quantitative and modelling applications.
  • How historical linguistics needs to engage with an up-to-date understanding of actual prehistory. Early processes of domestication and technologies, in particular, serve as a reality check on outdated, superficial inferences from traditional ‘linguistic palaeontology’ about the presumed roles of the horse, wheel, wagons, metals, crops, etc..
  • The farming/language dispersals hypothesis, often misunderstood and overstated, which underlay one of two major hypotheses on Indo-European origins.
  • The flaws in forcing interpretations of genetic data into a presumption of the Steppe hypothesis for Indo-European origins. In fact, ancient DNA results support a steppe origin as only secondary, and only for some branches in Europe, while for other crucial branches, not least Anatolian, Greek and Indo-Iranic, ancient DNA undermines a steppe homeland, and points elsewhere.
ImageLinkPublication Details     [count = 13]TypeAuthor-Date
URL

Anderson, C., Scarborough, M., Jocz, L., Kümmel, M.J., Jügel, T., Irslinger, B., Pooth, R., Liljegren, H., Strand, R.F., Haig, G., Geupel, U., Macak, M., Kim, R.I., Anonby, E., Pronk, T., Belyaev, O., Dewey-Findell, T.K., Boutilier, M., Freiberg, C., Tegethoff, R., Serangeli, M., Stronski, K., Falileyev, A., Liosis, N., Schulte, K., Gupta, G.K., Izadifar, R., Markus, P., Williams, N., Loi, S., Sims-Williams, N., Findell, M., Adibifar, S., Abete, G., Atanasov, P., Baiwir, E., Bastardas, M.-R., Benkato, A., Bevevino, L.S., Buchi, É., Cadorini, G., Cathcart, C., Cheveau, L., Christodoulou, C., Delorme, J., Dworkin, S.N., Ekici, D., Farridnejad, S., Gheitasi, M., Hammarström, H., Hewitt, S., Khan, A.A., Khan, M.K., Khokhlova, L., Kim, D., Lewin, C., Lushaj, B., Mahmoudveysi, P., Mahommadirad, M., Mersch, S., Mustafa, B., Nemati, F., Nourzaei, M., Muircheartaigh, P.Ó., Oogjen, V., Ourang, M., Pagan, H., Palmer, T.S., Pepper, S., Purandare, M., Rehman, K., Rhys, G., Røyneland, U., Sagar, M.Z., Sandstedt, J.J., Steensland, L., Taheri-Ardali, M., Talebi-Dastenaei, M., Tittel, S., Tresoldi, T., de Vaan, M., Verkerk, A., Versloot, A., Videsott, P., Vuletic, N., Widmer, M., Zeini, A., Bibiko, H.-J., Runge, F., Gray, R.D., & Heggarty, P. 2025.
The Indo-European Cognate Relationships dataset.
Scientific Data 12 (1): p.1541.
https://www.nature.com/articles/s41597-025-05445-3

Journal Article
URL

Heggarty, P., Anderson, C., Scarborough, M., King, B., Bouckaert, R., Jocz, L., Kümmel, M.J., Jügel, T., Irslinger, B., Pooth, R., Liljegren, H., Strand, R.F., Haig, G., Macák, M., Kim, R.I., Anonby, E., Pronk, T., Belyaev, O., Dewey-Findell, T.K., Boutilier, M., Freiberg, C., Tegethoff, R., Serangeli, M., Liosis, N., Stronski, K., Schulte, K., Gupta, G.K., Haak, W., Krause, J., Atkinson, Q.D., Greenhill, S.J., Kühnert, D., & Gray, R.D. 2023.
Language trees with sampled ancestors support a hybrid model for the origin of Indo-European languages.
Science 381 (6656): p.414, eabg0818.
https://www.science.org/doi/10.1126/science.abg0818

Journal Article
URL

Heggarty, P. 2022.
Redefining Indo-European origins? eLetter commentary on: Lazaridis et al. 2022 ‘The genetic history of the Southern Arc: A bridge between West Asia and Europe’.
Science 377 (6609).
https://doi.org/10.1126/science.abm4247#elettersSection

Journal Article
URL

Heggarty, P. 2021.
Cognacy databases and phylogenetic research on Indo-European.
Annual Review of Linguistics 7: p.371–94.
https://doi.org/10.1146/annurev-linguistics-011619-030507

Journal Article
URL

Heggarty, P. 2018a.
Why Indo-European? Clarifying cross-disciplinary misconceptions on farming vs. pastoralism.
in: Kroonen, G., J.P. Mallory, & B. Comrie (eds)
Talking Neolithic: Proceedings of the workshop on Indo-European origins held at the Max Planck Institute for Evolutionary Anthropology, Leipzig, December 2-3, 2013, Journal of Indo-European Studies Monograph Series, p.69–119.
Journal of Indo-European Studies.
http://jies.org/DOCS/monojpgs/Mon65.html

Book Section
URL

Heggarty, P. 2018b.
Indo-European and the Ancient DNA Revolution.
in: Kroonen, G., J.P. Mallory, & B. Comrie (eds)
Talking Neolithic: Proceedings of the workshop on Indo-European origins held at the Max Planck Institute for Evolutionary Anthropology, Leipzig, December 2-3, 2013, Journal of Indo-European Studies Monograph Series, p.120–73.
Journal of Indo-European Studies.
http://jies.org/DOCS/monojpgs/Mon65.html

Book Section
URL

Heggarty, P. 2018c.
Wer waren die Ur-Indoeuropäer? [= Who Were the Indo-Europeans?].
Spektrum der Wissenschaft 2018 (Spezial Archäologie-Geschichte-Kultur 4/2018): p.42–7.
https://spektrum.de/alias/1611302

Journal Article
URL

Heggarty, P., & Renfrew, C. 2014g.
Western and Central Asia: Languages.
in: Renfrew, C. & P. Bahn (eds)
The Cambridge World Prehistory, p.1678–99.
Cambridge: Cambridge University Press.
https://doi.org/10.1017/CHO9781139017831.101

Book Section
URL

Heggarty, P., & Renfrew, C. 2014h.
Europe and the Mediterranean: Languages.
in: Renfrew, C. & P. Bahn (eds)
The Cambridge World Prehistory, p.1977–93.
Cambridge: Cambridge University Press.
https://doi.org/10.1017/CHO9781139017831.115

Book Section
URL

Heggarty, P. 2014a.
Prehistory by Bayesian phylogenetics? The state of the art on Indo-European origins.
Antiquity 88 (340): p.566–77.
https://doi.org/10.1017/S0003598X00101188

Journal Article
URL

Heggarty, P. 2014c.
Das Rätsel der großen Sprachfamilien [= The Puzzle of the great language families].
Spektrum der Wissenschaft 2014 (08): p.70–6.
https://spektrum.de/alias/1298019

Journal Article
URL

Heggarty, P. 2013a.
Europe and Western Asia: Indo-European linguistic history.
in: Ness, I. & P. Bellwood (eds)
The Encyclopedia of Global Human Migration, p.157–67.
Oxford: Wiley-Blackwell.
https://doi.org/10.1002/9781444351071.wbeghm819

Book Section
-

Heggarty, P. 1994.
A Framework for the Systematic Comparison of Indo-European Languages.
. M.Phil. Dissertation. University of Cambridge.

Thesis