Tweedle: Sensitivity Check in Health-related Social Short Texts based on Regret Theory

R Geetha, S Karthika, N Pavithra, V Preethi
<span title="">2019</span> <i title="Elsevier BV"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/cx3f4s3qmfe6bg4qvuy2cxezyu" style="color: black;">Procedia Computer Science</a> </i> &nbsp;
Twitter helps us to know what is happening in the world and what people are talking right now. Every day, millions of Twitteraties tweet something personal or impersonal to express their emotions and valuable knowledge. In the health domain, disclosure of personal health information will have a long-term effect to common individuals either directly or indirectly, which emphasize the presence of unrealistic social boundaries and the need of sensitivity analysis in social media. The proposed
more &raquo; ... le framework was built with 100K tweets extracted based on a set of 20 health-related cyber-keywords. The framework of Tweedle was bounded with Regret Theory for tweet annotation, content and contextual feature scores for feature selection and various machine learning algorithms for sensitivity classification. The tweets annotated in accordance with Regret Theory by domain experts of Amazon Mechanical Turkresulted in 61.5% of sensitive tweets with health data. The context and contentoriented features scoresare introduced in terms of Primary / Secondary tweet score, Named Entity Recognition Score of tweets, Term Frequency-Inverse Document Frequency(TF-IDF), Cyber-KeywordRatioin tweets, hashtag mentions, user mentions as features for classification.The Tweedle experimented Regret Theory in combination with various classifiers like Support Vector Machine, Naïve Bayes, Random Forest, Decision Tree, Logistic Regression and Recurrent Neural Network + Long Short-Term Memory for sensitivity classificationin health domain tweets.The training and testing results proved RNN + LSTM as the better performing model to identify tweets with sensitive health data. Abstract Twitter helps us to know what is happening in the world and what people are talking right now. Every day, millions of Twitteraties tweet something personal or impersonal to express their emotions and valuable knowledge. In the health domain, disclosure of personal health information will have a long-term effect to common individuals either directly or indirectly, which emphasize the presence of unrealistic social boundaries and the need of sensitivity analysis in social media. The proposed Tweedle framework was built with 100K tweets extracted based on a set of 20 health-related cyber-keywords. The framework of Tweedle was bounded with Regret Theory for tweet annotation, content and contextual feature scores for feature selection and various machine learning algorithms for sensitivity classification. The tweets annotated in accordance with Regret Theory by domain experts of Amazon Mechanical Turkresulted in 61.5% of sensitive tweets with health data. The context and contentoriented features scoresare introduced in terms of Primary / Secondary tweet score, Named Entity Recognition Score of tweets, Term Frequency-Inverse Document Frequency(TF-IDF), Cyber-KeywordRatioin tweets, hashtag mentions, user mentions as features for classification.The Tweedle experimented Regret Theory in combination with various classifiers like Support Vector Machine, Naïve Bayes, Random Forest, Decision Tree, Logistic Regression and Recurrent Neural Network + Long Short-Term Memory for sensitivity classificationin health domain tweets.The training and testing results proved RNN + LSTM as the better performing model to identify tweets with sensitive health data.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/j.procs.2020.01.062">doi:10.1016/j.procs.2020.01.062</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/7lov77shqze63jdrzpv5jsvgoy">fatcat:7lov77shqze63jdrzpv5jsvgoy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200301170217/https://pdf.sciencedirectassets.com/280203/1-s2.0-S1877050920X00032/1-s2.0-S1877050920300703/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEEEaCXVzLWVhc3QtMSJHMEUCICKc6YikCMzDE1c81qmjt979oUnby%2Fr6aPirbO6AK9h0AiEAtGcPx1kgit9opjOnrdz%2BjLPN0BtACiHEy0g2NKS1QMYqtAMIGRACGgwwNTkwMDM1NDY4NjUiDFsYQIdyqpR33ojariqRA0C1AhAYrAc9AkfF6sZWWnH8eML5BU6EbLbXDYax7RhuTZqV6WnBSL%2B8A1ZnRfbNlL%2BcOeqDFuYaDZrQqxiq169Vwo9rxmeTidG6Ru33DM49oFKJX32nriqny6%2F%2BXnnUhotDq1%2BSV%2BUB1ZWG57yj%2FjroxTULBtHLX5gOCkMd355X6yc%2FflkQ%2FT0FAHUkb6ojeYcfe7A8Tx6XjNHH%2B7%2FiKDfViKCZx4UPCyzPrRvH18MhEDFStmpeTtoUnFzBn69QnsystcGHabkIsInKFAS%2BGSBO9VKcHtWJZn9aaO3OqtuI%2F5VgZn5spwiF2bOy3E307cKTdg5f33Rk%2FY3xoO%2FEC7H4MA%2FMEQBFgIBh5vCbK492NGnYtVhZjWGGc7eYyMUYWddxYtPOo3wOdTNt5l0daCzajzEfQE6ew7%2Fgvwdluv0ITcELixKIl17UzHWQyTJiDCCv50WvCc9CrcxVJYZ1AjehJ%2FbAn06i%2F1VzSSRb8NGJnKyswLo%2BVFZK2XjlECeGle2Wdgv%2FZzNXZPf4xHw%2F3U0bMO6%2F7%2FIFOusBnQc9oVOx8aHDslCJ2B9jwlEKsrJZR685dWQ%2BBkxWXqQekLRYtYlM0pRMTTr5PHBa3Njlf%2F0fRwKtlBsFYlATf8VkWziYxcLruQQM%2FV61uDNmvah452ub5w2bsEOdDQn8aTQcX9gn94VaWwuBaA7Gyg%2F1xwnUPn%2B9iTzBLIhDItZKcWqVOAUNWlpsTBcBiFe2godY9qvNxYqj22erNSD35B9CVxRvz3g7qgy%2BsB8yeTFc4cXvoQkbfZb8VIOMQrDcabUk0P1Br4aT6oPGiZDBBdF4eq1Gw9bBpOj5EGQoFR9P5ommfLX2lDr1xQ%3D%3D&amp;X-Amz-Algorithm=AWS4-HMAC-SHA256&amp;X-Amz-Date=20200301T170211Z&amp;X-Amz-SignedHeaders=host&amp;X-Amz-Expires=300&amp;X-Amz-Credential=ASIAQ3PHCVTYZUI5NV2F%2F20200301%2Fus-east-1%2Fs3%2Faws4_request&amp;X-Amz-Signature=a0c05d505fcbfc328073a350843114c2368ee943f4436c5ff03b92c9073dc04c&amp;hash=9184b13f00af26d860ab8c36ed8bce75fca3dbc19f80e7aaafe48acc8709faa6&amp;host=68042c943591013ac2b2430a89b270f6af2c76d8dfd086a07176afe7c76c2c61&amp;pii=S1877050920300703&amp;tid=spdf-4e3a36d1-6315-4576-b6fb-b4403e6eaff9&amp;sid=fd0c3e6735b0234e22884f43e5e8921c464fgxrqa&amp;type=client" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1016/j.procs.2020.01.062"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> elsevier.com </button> </a>