Early Detection of Signs of Anorexia and Depression Over Social Media using Effective Machine Learning Frameworks

Sayanta Paul, Sree Kalyani Jandhyala, Tanmay Basu
2018 Conference and Labs of the Evaluation Forum  
The CLEF eRisk 2018 challenge focuses on early detection of signs of depression or anorexia using posts or comments over social media. The eRisk lab has organized two tasks this year and released two different corpora for the individual tasks. The corpora are developed using the posts and comments over Reddit, a popular social media. The machine learning group at Ramakrishna Mission Vivekananda Educational and Research Institute (RKMVERI), India has participated in this challenge and
more » ... y submitted five results to accomplish the objectives of these two tasks. The paper presents different machine learning techniques and analyze their performance for early risk prediction of anorexia or depression. The techniques involve various classifiers and feature engineering schemes. The simple bag of words model has been used to perform ada boost, random forest, logistic regression and support vector machine classifiers to identify documents related to anorexia or depression in the individual corpora. We have also extracted the terms related to anorexia or depression using metamap, a tool to extract biomedical concepts. Theerefore, the classifiers have been implemented using bag of words features and metamap features individually and subsequently combining these features. The performance of the recurrent neural network is also reported using GloVe and Fasttext word embeddings. Glove and Fasttext are pre-trained word vectors developed using specific corpora e.g., Wikipedia. The experimental analysis on the training set shows that the ada boost classifier using bag of words model outperforms the other methods for task1 and it achieves best score on the test set in terms of precision over all the runs in the challenge. Support vector machine classifier using bag of words model outperforms the other methods in terms of fmeasure for task2. The results on the test set submitted to the challenge suggest that these framework achieve reasonably good performance.
dblp:conf/clef/PaulJB18 fatcat:qqvjzbvvefhpfhfvzxvx2clsza