Novel methods for text preprocessing and classification [thesis]

Tatiana Gasanova, Universität Ulm
2016
Written text is a form of communication that represents language (speech) using signs and symbols. For a given language text depends on the same structures as speech (vocabulary, grammar and semantics) and the structured system of signs and symbols (formal alphabet). Written text has always been an instrument of exchanging information, recording history, spreading knowledge, maintaining financial accounts and formation of legal systems. With the development of computers and Internet the amount
more » ... f textual information in digital form has dramatically grown. There is an increasing need to automatically process this information for variety of tasks related to text processing such as information retrieval, machine translation, question answering, topic categorization and topic segmentation, sentiment analysis etc. Many important text processing tasks fall into the field of text classification. This thesis addresses the development and evaluation of novel text preprocessing methods, which combine supervised and unsupervised learning models in order to reduce dimensionality of the feature space and improve the classification performance. Metaheuristic approaches for Support Vector Machine and Artificial Neural Network generation and parameters optimization are modified and applied for text classification and compared with other state-of-the-art methods using different text representations.
doi:10.18725/oparu-3242 fatcat:oybptqnr6rgrja3ufn73k4gzyq