Abstract

The paper describes a method for construction and annotation of a cor- pus of short texts made up on the basis of Russian posts from Twitter. This corpus is intended to train a sentiment classifier that sorts the general topic texts into three classes: "positive", "negative", and "neutral". The corpus is morphologically tagged in order to identify the characteristic features of each of the three classes of short texts. Parts of speech and unigrams and bigrams of terms were selected as the characteristic features. A vocabulary of emotional words was constructed based on the corpus; the weight of each term in the corpus was calculated.

DOI
10.31144/bncc.cs.2542-1972.2014.n37.p107-116
File
rubtsova.pdf266.63 KB
Issue
Pages
107-116