| Evaluating the reliability of source data in lexicostatistical analysis |
| |
| Mikhail Vasilyev (Institute of Oriental Studies of the Russian Academy of Sciences (Moscow); mvhumanity@gmail.com) |
| |
| Journal of Language Relationship, № 23/3-4, 2025 - p.389-403 |
| |
| Abstract: The article addresses the reliability of the source data used in comparative-historical linguistics for lexicostatistical purposes (to obtain genealogical classifications and linguistic dating). Such data, usually represented by a table of cognacy percentages between the basic wordlists of languages, may contain errors or inaccuracies, stemming from the complexity and subjectivity of the etymological analysis procedure, which may significantly reduce the reliability of lexicostatistical calculations. To solve this issue, a formal methodology is proposed based on the criterion of consistency (or transitivity) of percentage values in the initial lexicostatistical table. Applying this criterion enables the identification of unreliable values in the source data, as well as the numerical estimation of data inconsistency in each case. The advantages of the proposed approach include its simplicity and versatility, the objectivity of the results, and ease of implementation as a computer application. Testing the methodology on the lexicostatistical data of Romance and Turkic languages proves its applicability and practical
efficiency while examining both small and large language groups. |
| |
| Keywords: lexicostatistics, glottochronology, distance matrix, consistency criterion |
| |
| PDF |