A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit <a rel="external noopener" href="https://arxiv.org/pdf/2002.09202v1.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
Curating Social Media Data
[article]
<span title="2020-02-21">2020</span>
<i >
arXiv
</i>
<span class="release-stage" >pre-print</span>
Social media platforms have empowered the democratization of the pulse of people in the modern era. Due to its immense popularity and high usage, data published on social media sites (e.g., Twitter, Facebook and Tumblr) is a rich ocean of information. Therefore data-driven analytics of social imprints has become a vital asset for organisations and governments to further improve their products and services. However, due to the dynamic and noisy nature of social media data, performing accurate
<span class="external-identifiers">
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2002.09202v1">arXiv:2002.09202v1</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/5w2coglezfc4jlnu4h6oh23oqm">fatcat:5w2coglezfc4jlnu4h6oh23oqm</a>
</span>
more »
... lysis on raw data is a challenging task. A key requirement is to curate the raw data before fed into analytics pipelines. This curation process transforms the raw data into contextualized data and knowledge. We propose a data curation pipeline, namely CrowdCorrect, to enable analysts cleansing and curating social data and preparing it for reliable analytics. Our pipeline provides an automatic feature extraction from a corpus of social media data using existing in-house tools. Further, we offer a dual-correction mechanism using both automated and crowd-sourced approaches. The implementation of this pipeline also includes a set of tools for automatically creating micro-tasks to facilitate the contribution of crowd users in curating the raw data. For the purposes of this research, we use Twitter as our motivational social media data platform due to its popularity.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200322145907/https://arxiv.org/pdf/2002.09202v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
</button>
</a>
<a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2002.09202v1" title="arxiv.org access">
<button class="ui compact blue labeled icon button serp-button">
<i class="file alternate outline icon"></i>
arxiv.org
</button>
</a>