• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

'Our Goal Is Not to Determine Which Version Is Correct but to Explore the Variability'

'Our Goal Is Not to Determine Which Version Is Correct but to Explore the Variability'

Photo by Maksim Melenchenko

The International Linguistic Convergence Laboratory at the HSE Faculty of Humanities studies the processes of convergence among languages spoken in regions with mixed, multiethnic populations. Research conducted by linguists at HSE University contributes to understanding the history of language development and explores how languages are perceived and used in multilingual environments. George Moroz, head of the laboratory, shares more details in an interview with the HSE News Service.

— How did the laboratory's work begin?

— The laboratory was established in 2017, with Nina Dobrushina as its head and Johanna Nichols, Professor at the University of California, Berkeley, serving as Academic Supervisor. Prof. Nichols continues in this role remotely. Most of the researchers at the laboratory studied the languages of the Caucasus and the processes of convergence among them. For example, Nina Dobrushina, Michael Daniel, and Timur Maisak focused primarily on Dagestan, while Yury Lander and Anastasia Panova studied the Abkhaz-Adyghe languages.

One of the laboratory’s core research areas is linguistic typology, which involves classifying languages based on various criteria, such as the number of vowels and consonants. To this end, samples are collected that may include dozens of languages. Our laboratory is one of the few research centres in Russia conducting such studies—and perhaps the only one that focuses specifically on linguistic convergence. The laboratory also continues its research on the languages spoken in the Caucasus and the development of linguistic resources for them.

George Moroz
Photo by HSE University

In the Caucasus, Russian comes into contact with languages spoken by a diverse range of groups. These include the Nakh-Dagestanian languages; Turkic languages spoken in Dagestan such as Kumyk and Azerbaijani; as well as Abkhaz-Adyghe languages (Abkhazian, Abaza, Adyghe, and Kabardian); Kartvelian languages (Georgian, Megrelian, Svan, and Laz); and Indo-European languages (Armenian, Ossetian, and Tat).

The primary purpose of establishing the laboratory has been to study the mutual influence of languages on one another. A striking example is the Ossetian language, which, although Indo-European, features ejective consonants—unlike other Indo-European languages. These are consonants produced by closing the vocal cords to build up pressure, which is then released abruptly—for example, [k'], [p'], [t'], [ts'], and [tʃ']. Additionally, during an expedition to Azerbaijan, the laboratory staff studied dialects in the regions bordering Dagestan, and Mikhail Daniel discovered a dialect of Azerbaijani featuring ejective sounds (admittedly, this had also been mentioned in earlier reports). This is likely explained by the fact that the ancestors of people now living in the village of Ilisu once spoke a Nakh-Dagestanian language—presumably Tsakhur—and later shifted to Azerbaijani, retaining traces of their original language in the form of ejective consonants. Most likely, this occurred as a result of language contacts.

A similar hypothesis was put forward by our academic supervisor, Johanna Nichols, about the inhabitants of some villages in Dagestan. While the Avar language is widespread in the north of Dagestan, mainly in the lowlands, native speakers of Avar can still be found in mountainous villages surrounded by non-Avar communities. It is possible that these Avar speakers originally used other languages and later shifted to Avar due to its social prestige.

The process by which languages borrow from one another or even shift entirely, leading to the blending of languages or dialects, is known as linguistic convergence. Although this process is easier to observe in genetically unrelated languages, a similar phenomenon can also occur among related languages or dialects.

— Does convergence always occur between neighbouring languages?

— It occurs in most cases, but there are also instances where languages and their speakers 'seek' to differentiate themselves from one another. This is called linguistic divergence. Last year, we invited John Mansfield to speak at our seminar. Together with his colleagues, he published a typological study on linguistic divergence processes, drawing on data from 42 languages worldwide.

— You mentioned Dagestan, a region where many languages are spoken. Could you tell us more about this area and your research related to it?

— Dagestan is remarkable for its multilingualism and the mutual influence of its local languages. At the same time, these languages have also begun to change under the influence of Russian, which has increasingly penetrated the local linguistic environment.

Together with our research assistant Victoria Zubkova and research fellow Chiara Naccarato, I recently submitted a paper to a leading international linguistics journal on the adaptation of Russian borrowings in the languages of the Andic branch of the Nakh-Dagestanian language family. Earlier, most loanwords entered these languages primarily through Avar serving as an intermediary. But now, borrowings often come directly from Russian. We are currently developing models to identify which languages are more influenced by Russian and what factors determine the extent of this influence.

In the course of this research, we found that recent Russian loanwords in Avar and Botlikh undergo fewer phonetic changes than in other Andic languages—for example, the Akhvakh word for kopeika (kopeck) is кIебекIиi. The primary reason is that these languages have already been significantly influenced by Russian. Avar historically played an important role in northern Dagestan and continues to serve as a regional lingua franca. The findings of our study indicate that the adaptation of Russian loanwords in other Andic languages occurred more slowly than in Avar, but this process has clearly accelerated over time. Nowadays, borrowings are likely to enter all these languages without any phonetic adaptation.

— How do you acquire the materials needed for your research?

—We regularly conduct field expeditions to collect data, which is our most important source of material. Our colleagues recently returned from Armenia, and another team from Adygea. Recently, we have begun to make more active use of data collected by researchers outside of our laboratory.

Our laboratory has compiled 10 spoken corpora of bilinguals—people whose native language is not Russian but who have learned and regularly use it in daily life. Their speech—both in pronunciation and grammar—differs from that of monolingual speakers.

Corpora of individual Russian dialects are also being developed. The main challenge in collecting such material has been that Russian dialectologists were traditionally reluctant to share their data with us. Thanks to Nina Dobrushina, this has changed, and it is now standard practice for them to publish their dialect corpora with us. A total of 26 dialect corpora have been created in our laboratory.

We have also been compiling corpora of minority languages spoken in Russia and have already created 14 of them.

— Could you explain what the term corpus means to linguists? How and why do you create new corpora?

— Corpora are collections of written records of various types of speech, or simply annotated text collections. What distinguishes a corpus from a mere collection of texts is the presence of morphological or other linguistic markup. In particular, a corpus can be searched by category, such as finding which nouns precede infinitives. The Russian National Corpus, for example, is an extensive collection of texts that supports morphological searches. When preparing spoken corpora—whether bilingual or dialectal—we use transcriptions in standard Russian, which enables automatic morphological searches. In addition, our corpora include audio recordings that allow users to explore the unique features of dialects. Sometimes, it is necessary to listen to a recording several times to accurately determine whether specific sounds are present.

Corpora are central tools in modern linguistics. By analysing the frequency of various constructions within them, we can draw generalisations that form the basis of our published research.

One way to use corpora is to compare dialects or minority languages; using vector models, we can observe the overlap between their corpora and thereby determine which dialects and languages are more closely related and which are more distant.

Thus, based on our observations of bilingual corpora, the Russian spoken by Karelians is closer to standard Russian than the Russian spoken by Dagestanis. In Dagestan, local languages are influenced both by standard literary Russian and by a regional Dagestani variety of Russian, which originated in the republic and is evolving in a distinctive way. What matters for children is the amount of language they use. For example, if Lezgins speak Lezgian and Adygs speak Adyghe or Kabardian before switching to Russian, one might wonder which variety of Russian they speak—standard literary Russian or a local version influenced by features from their native languages. Our corpora make it possible to compare these specifics.

— What other resources have you been creating?

— As I mentioned earlier, the laboratory’s key resources include linguistic atlases for minority languages in Russia.

We also compile dictionaries for such languages. For example, we recently published a dictionary of Kina Rutul, spoken by native speakers in Dagestan and Azerbaijan, featuring approximately 1,200 words. I have analysed the Zilo dialect of Andi—a dialect that lacked a written form for a long time—and uploaded a dictionary of about 1,500 of its words to the laboratory's webpage. However, this is a much smaller dictionary compared to those produced by linguists native to the regions where the languages are spoken, as they often have a stronger command of the local language and more time to dedicate to this task.

Dictionaries published in Dagestan typically contain at least 5,000 to 6,000 entries. Recently, our colleague Prof. Madzhid Khalilov published an 11,000-word dictionary of Tsez (Dido), a language spoken in Dagestan—a phenomenal achievement for an unwritten language.

— What are the key focus areas of the laboratory's current work?

— Our core research area is linguistic typology, based on analysing samples of unrelated languages from around the world.

Another ongoing long-term project is the Typological Atlas of the Languages of Dagestan, which currently includes 58 chapters, each dedicated to a specific linguistic feature—such as the presence or absence of certain ejective consonants. Samira Verhees and Chiara Naccarato, research fellows at our laboratory, studied how speakers of different languages greet each other in the morning, and contributed a chapter on this topic. They found that in 17 languages, alongside 'Good morning!', the rhetorical question 'Did you wake up?' is also common, and in the Lak language, both greetings are used.

© HSE University

A key area of focus is the development of electronic dictionaries for the languages spoken in Dagestan. We are creating a unified database that will contain the lexical material of the Nakh-Dagestanian language family. The database originated from a series of term papers by students in the Fundamental and Computational Linguistics programme, who digitised and cleaned the data and developed a transliteration tool. Their papers include phonetic and morphological markups, as well as markups of borrowings from Russian, Arabic, Persian and Turkic languages. At present, we have unified materials on the Andic languages and Avar.

This greatly facilitates studies that require the use of multiple dictionaries. The previously mentioned paper by Victoria Zubkova and Chiara Naccarato was made possible thanks to this database, which also creates new avenues for research. This project holds great potential, and I hope it will continue.

Another important research area is the study of non-standard Russian, which includes both regional dialects and the characteristics of Russian as spoken by non-native speakers. We refer to this group as DiaL2, where 'Dia' means dialects and 'L2' denotes second language. We are interested in all variations that differ from the standard forms. Our goal is not to determine which version is ‘correct,’ but rather to explore the variability we observe. Our group includes both laboratory researchers and students. Recently, our research assistant Anna Grishanova had a paper accepted for publication on the phenomenon of preposition drop in the Russian speech of Chuvash bilinguals.

There is a separate Rutul project. As part of the Rediscovering Russia grant, we visited 12 Rutul villages and produced an atlas similar to the Typological Atlas of the Languages of Dagestan I mentioned earlier. The Atlas of Rutul Dialects contains 425 chapters, each dedicated to different aspects of dialectal variation, including lexicon, phonetics, morphology, and syntax. For example, one chapter focuses on the lexical item for hedgehog, which has different variants in Rutul dialects, including a Russian loanword alongside native terms such as ɢɨˤllɨnc’, kʲirpikʲ, žüže, and k’ɨˤnʁɨˤr.

© HSE University

There are two smaller projects: one on the Aramaic languages spoken in Russia, supported by a Russian Science Foundation grant (24-28-01009) and titled 'Areal and Typological Description of the Neo-Aramaic Varieties in Armenia,' led by Yuri Koryakov; and a second project focusing on the Abkhaz-Adyghe languages.

Generally speaking, language documentation is crucial for the cultures we work with. Some unwritten languages may disappear, but if we document them in time, future generations will have the opportunity to learn how their grandparents spoke, even if they no longer understand their native language.

— How is the laboratory's work organised?

— I believe that the weekly seminars held every Tuesday at 4 pm are one of the pillars of our laboratory. Since its inception, the laboratory has hosted over 230 seminars, featuring nearly 300 presentations. Almost all seminars are conducted in English, which gives us more opportunities to involve international colleagues and maintain academic connections. We have been visited by renowned international linguists, such as Martin Haspelmath, a leading expert in linguistic typology. During his trip to Moscow last December, he gave a talk at HSE University that generated great interest. The seminars also provide guidance to our research assistants on how to make presentations, ask questions, and generally conduct themselves during discussions in English. Since I became head of the laboratory, we have increasingly used seminars as a platform to discuss new academic papers. This stems from my strong belief that it is all too easy to stop reading papers altogether, or to restrict ourselves to a narrow specialisation and produce new papers mechanically. However, reading and discussing others’ papers—even those far from one’s own research—helps us maintain a broad perspective on linguistics, rather than getting lost in details, much like the parable of the blind sages and the elephant.

© HSE University

— How actively do you collaborate with other universities and HSE campuses?

— As part of the Mirror Laboratories project, we collaborated with Southern Federal University from 2022 to 2024. Our joint efforts included three sub-projects: one focused on studying Russian as a foreign language; a dialectology project that allowed us to compile and maintain a corpus of dialects spoken in the villages along the Don River, which we might continue; and a Digital Humanities (DH) project.

Digital Humanities are at the core of a current inter-campus project with HSE University in St Petersburg. Together with colleagues, we focus on applied computational linguistics. Specifically, in St Petersburg, they created a corpus of Russian short stories from 1930 to 2000, compiled a corpus of Soviet songs, and even developed a chatbot for the Hermitage Museum.

— What is the operating principle behind this chatbot?

— For example, a visitor might ask to see a painting depicting a woman holding a plate with a head on it, referring to Judith with the head of Holofernes; the chatbot is designed to retrieve the corresponding image. However, hardly anyone would be surprised if the result turned out to be Herodias with the head of John the Baptist.

— Are there other examples of applied work you could highlight?

— We have several ongoing applied research projects. For example, we have begun developing transliterators for the Nakh-Dagestanian languages. Our goal is to create a hub offering transliteration tools for texts in various languages, which would be highly valuable for linguists.

Additionally, we are developing morphological analysers for minority languages, as well as compiling corpora and dictionaries. Ultimately, all of this provides rich material for validating machine learning models across different modalities, including both audio and text. Such models often suffer from a shortage of expert data markup.

See also:

Slim vs Fat: Overweight Russians Earn Less

Overweight Russians tend to earn significantly less than their slimmer counterparts, with a 10% increase in body mass index (BMI) associated with a 9% decrease in wages. These are the findings made by Anastasiia Deeva, lecturer at the HSE Faculty of Economic Sciences and intern researcher in Laboratory of Economic Research in Public Sector. The article has been published in Voprosy Statistiki.

Scientists Reveal Cognitive Mechanisms Involved in Bipolar Disorder

An international team of researchers including scientists from HSE University has experimentally demonstrated that individuals with bipolar disorder tend to perceive the world as more volatile than it actually is, which often leads them to make irrational decisions. The scientists suggest that their findings could lead to the development of more accurate methods for diagnosing and treating bipolar disorder in the future. The article has been published in Translational Psychiatry.

Scientists Develop AI Tool for Designing Novel Materials

An international team of scientists, including researchers from HSE University, has developed a new generative model called the Wyckoff Transformer (WyFormer) for creating symmetrical crystal structures. The neural network will make it possible to design materials with specified properties for use in semiconductors, solar panels, medical devices, and other high-tech applications. The scientists will present their work at ICML, a leading international conference on machine learning, on July 15 in Vancouver. A preprint of the paper is available on arxiv.org, with the code and data released under an open-source license.

HSE Linguists Study How Bilinguals Use Phrases with Numerals in Russian

Researchers at HSE University analysed over 4,000 examples of Russian spoken by bilinguals for whom Russian is a second language, collected from seven regions of Russia. They found that most non-standard numeral constructions are influenced not only by the speakers’ native languages but also by how frequently these expressions occur in everyday speech. For example, common phrases like 'two hours' or 'five kilometres’ almost always match the standard literary form, while less familiar expressions—especially those involving the numerals two to four or collective forms like dvoe and troe (used for referring to people)—often differ from the norm. The study has been published in Journal of Bilingualism.

Overcoming Baby Duck Syndrome: How Repeated Use Improves Acceptance of Interface Updates

Users often prefer older versions of interfaces due to a cognitive bias known as the baby duck syndrome, where their first experience with an interface becomes the benchmark against which all future updates are judged. However, an experiment conducted by researchers from HSE University produced an encouraging result: simply re-exposing users to the updated interface reduced the bias and improved their overall perception of the new version. The study has been published in Cognitive Processing.

Mathematicians from HSE Campus in Nizhny Novgorod Prove Existence of Robust Chaos in Complex Systems

Researchers from the International Laboratory of Dynamical Systems and Applications at the HSE Campus in Nizhny Novgorod have developed a theory that enables a mathematical proof of robust chaotic dynamics in networks of interacting elements. This research opens up new possibilities for exploring complex dynamical processes in neuroscience, biology, medicine, chemistry, optics, and other fields. The study findings have been accepted for publication in Physical Review Letters, a leading international journal. The findings are available on arXiv.org.

Mathematicians from HSE University–Nizhny Novgorod Solve 57-Year-Old Problem

In 1968, American mathematician Paul Chernoff proposed a theorem that allows for the approximate calculation of operator semigroups, complex but useful mathematical constructions that describe how the states of multiparticle systems change over time. The method is based on a sequence of approximations—steps which make the result increasingly accurate. But until now it was unclear how quickly these steps lead to the result and what exactly influences this speed. This problem has been fully solved for the first time by mathematicians Oleg Galkin and Ivan Remizov from the Nizhny Novgorod campus of HSE University. Their work paves the way for more reliable calculations in various fields of science. The results were published in the Israel Journal of Mathematics (Q1).

Large Language Models No Longer Require Powerful Servers

Scientists from Yandex, HSE University, MIT, KAUST, and ISTA have made a breakthrough in optimising LLMs. Yandex Research, in collaboration with leading science and technology universities, has developed a method for rapidly compressing large language models (LLMs) without compromising quality. Now, a smartphone or laptop is enough to work with LLMs—there's no need for expensive servers or high-powered GPUs.

AI to Enable Accurate Modelling of Data Storage System Performance

Researchers at the HSE Faculty of Computer Science have developed a new approach to modelling data storage systems based on generative machine learning models. This approach makes it possible to accurately predict the key performance characteristics of such systems under various conditions. Results have been published in the IEEE Access journal.

Researchers Present the Rating of Ideal Life Partner Traits

An international research team surveyed over 10,000 respondents across 43 countries to examine how closely the ideal image of a romantic partner aligns with the actual partners people choose, and how this alignment shapes their romantic satisfaction. Based on the survey, the researchers compiled two ratings—qualities of an ideal life partner and the most valued traits in actual partners. The results have been published in the Journal of Personality and Social Psychology.