A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit <a rel="external noopener" href="https://ijarcce.com/wp-content/uploads/2019/02/IJARCCE.2018.71102.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="Tejass Publisheers">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ezm4lr6uezhi5pfgr6xkdexmgy" style="color: black;">IJARCCE</a>
Text mining have gain huge momentum in recent years, with user-generated content becoming widely available. One keyuse is remark mining, with much attention being given to sentiment analysis and opinion mining. An essential step in the process of comment mining is text pre-processing; a step in which each linguistic term is assigned with a weight that commonly increase with its appearance in the studied text, yet is offset by the occurrence of the term in the domain of interest. A common<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.17148/ijarcce.2018.7112">doi:10.17148/ijarcce.2018.7112</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/uzfh4nmdjff6dpcm6zhy3ybtcy">fatcat:uzfh4nmdjff6dpcm6zhy3ybtcy</a> </span>
more »... e is to use the well-known tf-idf formula to calculate these weights.This paper reveals the bias introduce by between-participants' discourse to the study of comments in social media, and proposes an adjustment. We find that content extract from discourse is often highly correlated, resulting in dependence structures between observations in the study, thus introducing a statistical bias. Ignoring this bias can obvious in a nonrobust analysis at best and can lead to an entirely wrong conclusion at worst. We propose a change to tf-idf that accounts for this bias. We show the effects of both the bias and correction with seven Facebook fan pages data, covering different domains, including news, finance, politics, sport, shopping, and entertainment.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220303214208/https://ijarcce.com/wp-content/uploads/2019/02/IJARCCE.2018.71102.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f9/21/f9218df89a18b74565b55f12d294badd2437e250.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.17148/ijarcce.2018.7112"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>