Contextual Text Analytics Framework for Citizen Report Classification: A Case Study Using the Indonesian Language

Evaristus Didik Madyatmadja, Bernardo Nugroho Yahya, Cristofer Wijaya
2022 IEEE Access  
Citizen science has emerged in many countries to contribute to the prompt resolution of individual field problems and has been shifted toward Information System (IS) research. In the domain of IS, a citizen report mechanism has been introduced in many local governments to understand regional problems based on the public participation. The rising of social media enforces many organizations including the local governments to utilize any information from the citizens including texts. Text mining
more » ... s been utilized in various types of analyses such as sentiment analysis. However, it shows many challenges when it comes to the local context. The local context of words could cause various conflation errors that highly affect the learning task such as classification methods. This study aims to propose a context-based text processing and feed the proposed approach into a machine learning framework to classify the data of citizen reports-. The context-based text preprocessing utilized statistical-and semantic-based measurements to extract the local context and elaborate domain expertise to verify the misinterpretation for further text processing such as feature extractions. Subsequently, the n-gram language models together with the Term Frequency and Inverse Document Frequency schemes were performed to build the features. The result showed that the contextbased text preprocessing improved the classification performance in majority classifiers in about 3% with the combinations of n-gram features.
doi:10.1109/access.2022.3158940 fatcat:etyz6iaiongjpn7o4k6j2isgbi