Lexical Semantics in Text Processing: Contrastive Diachronic Studies on Romanian Language

Not yet reviewed

Publication

AuthorsDaniela Gîfu

Year2026

JournalSSRN Electronic Journal

Abstract

In this PhD thesis, we propose a diachronic model derived from text mining. In addition, as a personal target, we wanted to reach a balance between theory and applications in the internal structure of each chapter. Thus, the theoretical device, on the one hand, and the practical device, on the other hand, completes the picture of the journalistic context, analysed in Chapter 6, text mining techniques and models, of Chapter 5, in conjunction with the language similarity analysis, of Chapter 7. Furthermore, we opted for six theoretical chapters, each including at least 3 sections that, we believe, offer stability to the theoretical framework of this thesis. Even though each chapter has a well-defined role, we were equally interested in the links between chapters. These have been established with the aim to highlight the interdependencies of the major research directions in relation to the main goals of the PhD research as follows: 2.1 Historical corpora are connected to 6.1.1 A brief history of Romanian newspapers. Romanian corpus description and 6.2.2 A brief history of Bessarabian newspapers. Bessarabian corpus description; 2.2 Diachronic text classification creates an echo in 5.2.1 Naïve Bayes – a generative model; 2.3 Languages similarities using machine learning finds a reverberation in 5.2.2 Maximum Entropy&nbsp;Modelling, 5.3.1 Support Vector Machines (SVMs), 5.4.1 LDA – a probabilistic topic model and 5.4.2 LSA – a vector space model, all these methods being used in our empirical studies described in Chapter 7. Therefore, the entire theoretical content of this PhD thesis is orientated to the understanding of the tools’ role, necessary for an exceptional analytic demarche. In addition, Chapter 8 punctually records final remarks on the contributions of this work. The language has never been and will never be static, the main feature that characterizes it being its dynamism, including both internal processes of forming words and of borrowing words. In this view, this PhD thesis discusses the theoretical backgrounds that allow to approach a computational investigation over three types of “deviations” of a language from the current norm, seen as a system of linguistic conventions [Șuteu, 1976]: phonetical (the writing of sounds), grammatical (flexion), and lexical (inventory of words in use over a territory), and offers practical means to perform such an investigation.&nbsp;