A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Algorithmic Fairness Datasets: the Story so Far
[article]
2022
arXiv
pre-print
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information ...
Finally, we analyze these datasets from the perspective of five important data curation topics: anonymization, consent, inclusivity, sensitive attributes, and transparency. ...
Acknowledgements The authors would like to thank the following researchers and dataset creators for the useful feedback on the data briefs: Alain Barrat, Luc Behaghel, Asia Biega, Marko Bohanec, Chris ...
arXiv:2202.01711v2
fatcat:5hf4a42pubc5vnt7tw3al4m5bq
A Systematic Survey of Online Data Mining Technology Intended for Law Enforcement
2015
ACM Computing Surveys
As more and more crime takes on a digital aspect, law enforcement bodies must tackle an online environment generating huge volumes of data. ...
Such technologies must be well-designed and rigorously grounded, yet no survey of the online data-mining literature exists which examines their techniques, applications and rigour. ...
[Wang et al. 2012c ] use Twitter as a source of general crime prediction, drawing on automatic semantic analysis, event extraction and geographical information systems to map crime hotspots. ...
doi:10.1145/2811403
fatcat:qpvfebejpfgp5bh3cgykaoumze
Table of Contents
2019
2019 International Conference on Advancements in Computing (ICAC)
The knowledge store gains information and expands its knowledge from the internet by crawling websites. ...
The 'User Report Information' is included conditions of the related train and can be shared among the other interested parties through our System. ...
doi:10.1109/icac49085.2019.9129879
fatcat:gm5mw7qn5ncoxjr6klcl3x4uju
Open challenges for data stream mining research
2014
SIGKDD Explorations
ABSTRACT We discuss the most important database research advances, industry developments, role of relational and NoSQL databases, Computing Reality, Data Curation, Cloud Computing, Tamr and Jisto startups ...
Streaming data can be considered as one of the main sources of what is called big data. ...
Part of this work was funded by the German Research Foundation, projects SP 572/11-1 (IMPRINT) and HU 1284/5-1, the Academy of Finland grant 118653 (ALGODAN), and the Polish National Science Center grants ...
doi:10.1145/2674026.2674028
fatcat:y3bozzeohveibgxb5wmiwfcogm
Processing Social Media Messages in Mass Emergency: A Survey
[article]
2015
arXiv
pre-print
We examine the particularities of this setting, and then methodically examine a series of key sub-problems ranging from the detection of events to the creation of actionable and useful summaries. ...
Processing social media messages to obtain such information, however, involves solving multiple challenges including: handling information overload, filtering credible information, and prioritizing different ...
Peo-
ple post situation-sensitive information on social media related to what they experi-
ence, witness, and/or hear from other sources [Hughes and Palen 2009]. ...
arXiv:1407.7071v3
fatcat:e7mcvae5freddaus7ndolygeti
Understanding And Mapping Big Data
2015
Zenodo
Understanding and mapping big data. Deliverable D1.1 BYTE Project. ...
The technical challenges arise from data acquisition and data curation to data analysis and data visualization. ...
Predictive policing uses historical crime data to automatically discover trends and patterns in the data. ...
doi:10.5281/zenodo.49161
fatcat:wz3cwet3wfbmvfzucivu3t64eq
A Review of Computer Vision Methods in Network Security
[article]
2020
arXiv
pre-print
However, such methods are more based on statistical features extracted from sources such as binaries, emails, and packet flows. ...
Next, we review a set of such commercial products for which public information is available and explore how computer vision methods are effectively used in those products. ...
. random attacks, targeted attacks, multi-source attack, and port scans). ...
arXiv:2005.03318v1
fatcat:pcng7535obec3l6fejkllbi3ii
Journalism as usual: The use of social media as a newsgathering tool in the coverage of the Iranian elections in 2009
2012
Journal of Media Practice
Frenemy Google is often dubbed the frenemy of news organisations: half friend and half enemy. ...
Many media organisations are uncomfortable that Google can index and link with impunity yet they value the traffic it creates. (Chapter 13) ...
linked to the original table of crime reports. ...
doi:10.1386/jmpr.13.1.61_1
fatcat:abhz6rqlffdzdklr25gm5d4anq
Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases
[article]
2021
arXiv
pre-print
Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines ...
On top of this, the article discusses the automatic extraction of entity-centric properties. ...
We also appreciate the sustained encouragement and support by our editors Surajit Chaudhuri, Joe Hellerstein and Ihab Ilyas. ...
arXiv:2009.11564v2
fatcat:vh2lqfmhhbcwpf6dcsej3hhvgy
Linked Data: Evolving the Web into a Global Data Space
2011
Synthesis Lectures on the Semantic Web Theory and Technology
The book discusses patterns for publishing Linked Data, describes deployed Linked Data applications and examines their architecture. ...
This book gives an overview of the principles of Linked Data as well as the Web of Data that has emerged through the application of these principles. ...
This enables applications to automatically take advantage of new data sources as they become available on the Web of Data.
2. ...
doi:10.2200/s00334ed1v01y201102wbe001
fatcat:y5qflrlwqzd5jhryuazzcoyfdu
By Hook or by Crook: Exposing the Diverse Abuse Tactics of Technical Support Scammers
[article]
2017
arXiv
pre-print
Thus, investigation of search-and-ad abuse provides new insights into TSS tactics and helps detect previously unknown abuse infrastructure that facilitates these scams. ...
Our study period of 8 months uncovered over 9,000 TSS domains, of both passive and aggressive types, with minimal overlap between sets that are reached via organic search results and sponsored ads. ...
The URI component of the ADs and SRs are then inserted into the ADC (AD crawling) and SRC (SR crawling) queues respectively, which then coordinate with the ACM to gather more information about them, as ...
arXiv:1709.08331v1
fatcat:tvxw4xq5sja3hksnagwhpucdde
A First Look at the Crypto-Mining Malware Ecosystem
2019
Proceedings of the Internet Measurement Conference on - IMC '19
Our analysis pipeline applies both static and dynamic analysis to extract information from the samples, such as wallet identifiers and mining pools. ...
CCS CONCEPTS • Security and privacy → Malware and its mitigation; • Social and professional topics → Malware / spyware crime; • General and reference → Measurement. ...
The opinions, findings, and conclusions or recommendations expressed are those of the authors and do not necessarily reflect those of any of the funders. ...
doi:10.1145/3355369.3355576
dblp:conf/imc/PastranaS19
fatcat:4hdozcislrgipa4pkeptlotj5u
From social data mining to forecasting socio-economic crises
2011
The European Physical Journal Special Topics
and economic systems.Describe requirements for efficient large-scale scientific data mining of anonymized social and economic data.Formulate strategies how to collect stylized facts extracted from large ...
the storage, processing, evaluation, and publication of social and economic data. ...
The authors are grateful for financial support by the Future and Emerging Technologies programme FP7-COSI-ICT of the European Commission through the project Visioneer (grant no.: 248438). ...
doi:10.1140/epjst/e2011-01401-8
pmid:32215190
pmcid:PMC7088654
fatcat:qgixn26btng2flz4bqplqqgede
On the Opportunities and Risks of Foundation Models
[article]
2021
arXiv
pre-print
This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical ...
Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization ...
Fernando Pereira, Vinodkumar Prabhakaran, Colin Raffel, Marten van Schijndel, Ludwig Schmidt, Yoav Shoham, Madalsa Singh, Megha Srivastava, Jacob Steinhardt, Emma Strubell, Qian Yang, Luke Zettlemoyer, and ...
arXiv:2108.07258v2
fatcat:yktkv4diyrgzzfzqlpvaiabc2m
A Similarity-based Machine Learning Approach for Detection of Software Clones
2021
Expert systems with applications
As a result, an enormous amount of unstructured data is created that demands much time and effort to organize, search or manipulate. ...
Intelligent classification of text document in a resource-constrained language (like Bengali) is challenging due to unavailability of linguistic resources, intelligent NLP tools, and larger text corpora ...
Acknowledgements This work was supported by the Establishment of CUET IT Business Incubator Project, BHTPA, ICT Division, Bangladesh for the research on "Automatic Bengali Document Categorization based ...
doi:10.1016/j.eswa.2021.115394
fatcat:44sqcdpj7nfvjoa33u4dbjcpmi
« Previous
Showing results 1 — 15 out of 125 results