A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Dynamic Spatial Verification for Large-Scale Object-Level Image Retrieval
[article]
2019
arXiv
pre-print
We propose a new approach for spatial verification that aims at modeling object-level regions dynamically clustering keypoints in a 2D Hough space, which are then used to accurately weight small contributing ...
For instance, hidden people, spliced objects, and subtly altered scenes can be difficult for a user to detect initially in a meme image, but may contribute significantly to its composition. ...
In this paper we propose a new solution for spatial verification in image retrieval that supports these tasks by modeling object-level regions with the goal of retrieving the sources of small regions in ...
arXiv:1903.10019v4
fatcat:ndke65wvkrftjn5wlneb3ea3um
Finding the needle in high-dimensional haystack: A tutorial on canonical correlation analysis
[article]
2018
arXiv
pre-print
Canonical correlation analysis (CCA) is a prototypical family of methods for wrestling with and harvesting insight from such rich datasets. ...
The complexity of such big data repositories offer new opportunities and pose new challenges to investigate brain, cognition, and disease. ...
for a small fraction of the variance (5, 12) . ...
arXiv:1812.02598v1
fatcat:wa2urvsku5fqbdi7bkzjg3mzxi
Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists
2020
NeuroImage
The 21st century marks the emergence of "big data" with a rapid increase in the availability of data sets with multiple measurements. ...
The complexity of such "big data" repositories offer new opportunities and pose new challenges for systems neuroscience. ...
for a small fraction of the variation in the data . ...
doi:10.1016/j.neuroimage.2020.116745
pmid:32278095
fatcat:7samuqfqwvdvdnhafps3zslmjy
Looking for Razors and Needles in a Haystack: Multifaceted Analysis of Suicidal Declarations on Social Media—A Pragmalinguistic Approach
2021
International Journal of Environmental Research and Public Health
makes it unsuitable for application in psychological and linguistic studies. ...
To do that, we firstly collect a large-scale dataset of Reddit posts and annotate it with highly trained and expert annotators under a rigorous annotation scheme. ...
The core of the dataset remained mostly unchanged, with the exception of correction of a small number of annotation errors caught in the process of constant revision. ...
doi:10.3390/ijerph182211759
pmid:34831513
pmcid:PMC8624334
fatcat:hikz6ggmbfe37o72xdvgpc3zfy
Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists
2020
A B S T R A C T The 21st century marks the emergence of "big data" with a rapid increase in the availability of datasets with multiple measurements. ...
Importantly, CCA is well suited to describing relationships across multiple sets of data, such as in recently available big biomedical datasets. ...
to big datasets. ...
doi:10.18154/rwth-conv-244184
fatcat:ffaeozwrg5e5zliofielrucp3u
Small Data, Data Infrastructures and Big Data
2014
Social Science Research Network
It is a strategy that has been remarkably successful, enabling the sciences, social sciences and humanities to advance in leaps and bounds. ...
The production of academic knowledge has progressed for the past few centuries using small data studies characterized by sampled data generated to answer specific questions. ...
Acknowledgements The research for this paper was funded by a European Research Council Advanced Investigator award (ERC-2012-AdG-323636-SOFTCITY). ...
doi:10.2139/ssrn.2376148
fatcat:2kyeh3l2vvdirakhofmradlnna
Big Data: Issues and Challenges Moving Forward
2013
2013 46th Hawaii International Conference on System Sciences
We analyze the issues and challenges as we begin a collaborative research program into methodologies for big data analysis and design. ...
Big data refers to data volumes in the range of exabytes (10 18 ) and beyond. Such volumes exceed the capacity of current on-line storage systems and processing systems. ...
Finding the
needle in the
haystack ...
doi:10.1109/hicss.2013.645
dblp:conf/hicss/KaislerAEM13
fatcat:a4pfro3bi5afzigzbzdls5p76e
The Image of the City out of the Underlying Scaling of City Artifacts or Locations
2013
Annals of the Association of American Geographers
This scaling refers to the fact that, in an imageable city (a city that can easily be imaged in human minds), small city artifacts are far more common than large ones; or alternatively low dense locations ...
mapping in general, into a quantitative manner. ...
Acknowledgement The author would like to thank XXX for his constructive comments that significantly help improve the quality of this paper. ...
doi:10.1080/00045608.2013.779503
fatcat:gpcpdkxdajfjbd2nucseozw7h4
Crowdsourcing the Unknown: The Satellite Search for Genghis Khan
2014
PLoS ONE
A increased groundtruthed accuracy was observed in those participants exposed to the peer feedback loop over those whom worked in isolation, suggesting collective reasoning can emerge within networked ...
Without a pre-existing reference for validation we turn towards consensus, defined by kernel density estimation, to pool human perception for "out of the ordinary" features across a vast landscape. ...
Independent Reasoning This effort seeks ''a needle in a haystack'', but in a scenario where the appearance of the needle is undefined. ...
doi:10.1371/journal.pone.0114046
pmid:25549335
pmcid:PMC4280225
fatcat:mv3whrwm5bgqzesy7d6mh5ciay
Use cases and challenges in telecom big data analytics
2016
APSIPA Transactions on Signal and Information Processing
This paper examines the driving forces of big data analytics in the telecom domain and the benefits it offers. ...
We provide example use cases of big data analytics and the associated challenges, with the hope to inspire new research ideas that can eventually benefit the practice of the telecommunication industry. ...
From the perspectives of variety, volume, and velocity, telecom operators, like enterprises in all other verticals, face the following two major challenges: • Needle in a haystack -how to uncover correlation ...
doi:10.1017/atsip.2016.20
fatcat:vaocc7lozjhtbdbxlln74yvj4a
Identifying AI talents in LinkedIn database, A machine learning approach
2019
Zenodo
Searching for keywords in profiles' sections can lead to mis-identification of certain profiles, especially for those related to a field rather than an occupation. ...
We suggest this approach allows to avoid manually labeling the training dataset, granted the assumption that job profiles posted by recruiters are more "ideal-typical" or simply provide a more consistent ...
Machine learning is usually a good approach when confronted to a "Needle in a haystack" type of problem. ...
doi:10.5281/zenodo.2649208
fatcat:ygh2pfxaunhnxguhqnvmkie5ou
Identifying AI talents among LinkedIn members, A machine learning approach
2019
Zenodo
Searching for keywords in profiles' sections can lead to mis-identification of certain profiles, especially for those related to a field rather than an occupation. ...
We suggest this approach allows to avoid manually labeling the training dataset, granted the assumption that job profiles posted by recruiters are more "ideal-typical" or simply provide a more consistent ...
Machine learning is usually a good approach when confronted to a "Needle in a haystack" type of problem. ...
doi:10.5281/zenodo.3240963
fatcat:etngfwtyszh67chl65gnsyee7i
A Mathematical Approach to Healthcare Insurance Data Analytics
2021
UPI YPTK Journal of Computer Scine and Information Technology
However, the near absence of a mathematical foundation for analytics has become a real challenge amidst the flock of big data marketing activities, especially in healthcare insurance. ...
A prototype for the model was implemented using Java Programming Language, MapReduce Framework, Association Rule Mining and MongoDB. ...
The concept of big data analytics involves the collection of large datasets similar to the haystack; there also, exist some small datasets within the given large datasets similar to the needle, existing ...
doi:10.35134/jcsitech.v7i4.18
fatcat:yvv3e4mv6jdilfpgcmfktzymeu
Storage and Database Management for Big Data
[chapter]
2016
Big Data
Acknowledgements The authors wish to thank the LLGrid team at MIT Lincoln Laboratory for their support in setting up the computational environment used to test the performance of Apache Accumulo and SciDB ...
Databases are designed to pull small chunks of information out (finding a needle in a haystack) and not for sequential access (the forte of distributed filesystems). ...
For a larger dataset (100s of GB to 10 TB in 2015), the most efficient solution for small requests (less than 5% of the entire dataset) is to use a database. ...
doi:10.1201/b19694-4
fatcat:jkayqvvhbzatjnz4kqyi2f5vbq
Semi-supervised teacher-student deep neural network for materials discovery
[article]
2021
arXiv
pre-print
At the same time, there are a significant amount of unlabelled data available in these databases. ...
Here we propose a semi-supervised deep neural network (TSDNN) model for high-performance formation energy and synthesizability prediction, which is achieved via its unique teacher-student dual network ...
This work was supported in part by the South Carolina Honors College Research Program. This work is partially supported by a grant from the University of South Carolina Magellan Scholar Program. ...
arXiv:2112.06142v1
fatcat:o7volcq7enb53ka2asv7y5swpm
« Previous
Showing results 1 — 15 out of 218 results