A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit <a rel="external noopener" href="http://www.projectsgoal.com:80/download_projects/data-mining/data-mining-projects-GDM00069.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="Institute of Electrical and Electronics Engineers (IEEE)">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ht3yl6qfebhwrg7vrxkz4gxv3q" style="color: black;">IEEE Transactions on Knowledge and Data Engineering</a>
In this paper, we introduce a new topic model to understand the chaotic microblogging environment by using hashtag graphs. Inferring topics on Twitter becomes a vital but challenging task in many important applications. The shortness and informality of tweets leads to extreme sparse vector representations with a large vocabulary. This makes the conventional topic models (e.g., Latent Dirichlet Allocation  and Latent Semantic Analysis ) fail to learn high quality topic structures. Tweets<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/tkde.2016.2531661">doi:10.1109/tkde.2016.2531661</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/h4mtgomevbewfnr4c3kyfbblsq">fatcat:h4mtgomevbewfnr4c3kyfbblsq</a> </span>
more »... e always showing up with rich user-generated hashtags. The hashtags make tweets semi-structured inside and semantically related to each other. Since hashtags are utilized as keywords in tweets to mark messages or to form conversations, they provide an additional path to connect semantically related words. In this paper, treating tweets as semi-structured texts, we propose a novel topic model, denoted as Hashtag Graph-based Topic Model (HGTM) to discover topics of tweets. By utilizing hashtag relation information in hashtag graphs, HGTM is able to discover word semantic relations even if words are not co-occurred within a specific tweet. With this method, HGTM successfully alleviates the sparsity problem. Our investigation illustrates that the user-contributed hashtags could serve as weakly-supervised information for topic modeling, and the relation between hashtags could reveal latent semantic relation between words. We evaluate the effectiveness of HGTM on tweet (hashtag) clustering and hashtag classification problems. Experiments on two real-world tweet data sets show that HGTM has strong capability to handle sparseness and noise problem in tweets. Furthermore, HGTM can discover more distinct and coherent topics than the state-of-the-art baselines.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20181008193042/http://www.projectsgoal.com:80/download_projects/data-mining/data-mining-projects-GDM00069.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/16/65/16654369a1e787fcebdee9a2b868feae5e9b5cd7.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/tkde.2016.2531661"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>