A combined approach to part-of-speech homonymy resolution

Author(s)

Abstract

The Russian language has an inflective structure and does not have a strict word order, which generates processing problems such as part-of-speech homonymy. The paper addresses this issue. The existing approaches to resolving the morphological homonymy problem can be divided into the following groups: rule-based approaches, statistical approaches, machine learning approaches, and combined methods. In the paper, we showed that each approach has its advantages and disadvantages; however, we can achieve a much higher precision of the algorithm by combining several approaches. The combined method based on neural networks gives better results than others (98% precision obtained). We used the following features: normalizing substitutions, grammatical and syntactic characteristics, vector representation of the word, and word forms. All the experiments were performed on the part of the National Corpus of the Russian Language with homonymy resolution. The analysis of the corpus revealed that the most frequent types of homonymy occurred between function words: a particle vs an interjection (14%), and a preposition vs an interjection (13.2%).

Keywords

text processing,

part-of-speech homonymy,

combined approach,

machine learning,

homonymy resolution

DOI

10.31144/bncc.cs.2542-1972.2017.n41.p13-25

File

Bruches_Batura_Bull_41.pdf892.56 KB

Issue

Computer Science, № 41, 2017

Pages

13-25