Filters








218 Hits in 8.2 sec

Dynamic Spatial Verification for Large-Scale Object-Level Image Retrieval [article]

Joel Brogan, Aparna Bharati, Daniel Moreira, Kevin Bowyer, Patrick Flynn, Anderson Rocha, Walter Scheirer
2019 arXiv   pre-print
We propose a new approach for spatial verification that aims at modeling object-level regions dynamically clustering keypoints in a 2D Hough space, which are then used to accurately weight small contributing  ...  For instance, hidden people, spliced objects, and subtly altered scenes can be difficult for a user to detect initially in a meme image, but may contribute significantly to its composition.  ...  In this paper we propose a new solution for spatial verification in image retrieval that supports these tasks by modeling object-level regions with the goal of retrieving the sources of small regions in  ... 
arXiv:1903.10019v4 fatcat:ndke65wvkrftjn5wlneb3ea3um

Finding the needle in high-dimensional haystack: A tutorial on canonical correlation analysis [article]

Hao-Ting Wang, Jonathan Smallwood, Janaina Mourao-Miranda, Cedric Huchuan Xia, Theodore D. Satterthwaite, Danielle S. Bassett, Danilo Bzdok
2018 arXiv   pre-print
Canonical correlation analysis (CCA) is a prototypical family of methods for wrestling with and harvesting insight from such rich datasets.  ...  The complexity of such big data repositories offer new opportunities and pose new challenges to investigate brain, cognition, and disease.  ...  for a small fraction of the variance (5, 12) .  ... 
arXiv:1812.02598v1 fatcat:wa2urvsku5fqbdi7bkzjg3mzxi

Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists

Hao-Ting Wang, Jonathan Smallwood, Janaina Mourao-Miranda, Cedric Huchuan Xia, Theodore D. Satterthwaite, Danielle S. Bassett, Danilo Bzdok
2020 NeuroImage  
The 21st century marks the emergence of "big data" with a rapid increase in the availability of data sets with multiple measurements.  ...  The complexity of such "big data" repositories offer new opportunities and pose new challenges for systems neuroscience.  ...  for a small fraction of the variation in the data .  ... 
doi:10.1016/j.neuroimage.2020.116745 pmid:32278095 fatcat:7samuqfqwvdvdnhafps3zslmjy

Looking for Razors and Needles in a Haystack: Multifaceted Analysis of Suicidal Declarations on Social Media—A Pragmalinguistic Approach

Michal Ptaszynski, Monika Zasko-Zielinska, Michal Marcinczuk, Gniewosz Leliwa, Marcin Fortuna, Kamil Soliwoda, Ida Dziublewska, Olimpia Hubert, Pawel Skrzek, Jan Piesiewicz, Paula Karbowska, Maria Dowgiallo (+5 others)
2021 International Journal of Environmental Research and Public Health  
makes it unsuitable for application in psychological and linguistic studies.  ...  To do that, we firstly collect a large-scale dataset of Reddit posts and annotate it with highly trained and expert annotators under a rigorous annotation scheme.  ...  The core of the dataset remained mostly unchanged, with the exception of correction of a small number of annotation errors caught in the process of constant revision.  ... 
doi:10.3390/ijerph182211759 pmid:34831513 pmcid:PMC8624334 fatcat:hikz6ggmbfe37o72xdvgpc3zfy

Finding the needle in a high-dimensional haystack: Canonical correlation analysis for neuroscientists

Hao-Ting Wang, Jonathan Smallwood, Janaina Mourao-Miranda, Cedric Huchuan Xia, Theodore D. Satterthwaite, Danielle S. Bassett, Danilo Bzdok
2020
A B S T R A C T The 21st century marks the emergence of "big data" with a rapid increase in the availability of datasets with multiple measurements.  ...  Importantly, CCA is well suited to describing relationships across multiple sets of data, such as in recently available big biomedical datasets.  ...  to big datasets.  ... 
doi:10.18154/rwth-conv-244184 fatcat:ffaeozwrg5e5zliofielrucp3u

Small Data, Data Infrastructures and Big Data

Rob Kitchin, Tracey P. Lauriault
2014 Social Science Research Network  
It is a strategy that has been remarkably successful, enabling the sciences, social sciences and humanities to advance in leaps and bounds.  ...  The production of academic knowledge has progressed for the past few centuries using small data studies characterized by sampled data generated to answer specific questions.  ...  Acknowledgements The research for this paper was funded by a European Research Council Advanced Investigator award (ERC-2012-AdG-323636-SOFTCITY).  ... 
doi:10.2139/ssrn.2376148 fatcat:2kyeh3l2vvdirakhofmradlnna

Big Data: Issues and Challenges Moving Forward

Stephen Kaisler, Frank Armour, J. Alberto Espinosa, William Money
2013 2013 46th Hawaii International Conference on System Sciences  
We analyze the issues and challenges as we begin a collaborative research program into methodologies for big data analysis and design.  ...  Big data refers to data volumes in the range of exabytes (10 18 ) and beyond. Such volumes exceed the capacity of current on-line storage systems and processing systems.  ...  Finding the needle in the haystack  ... 
doi:10.1109/hicss.2013.645 dblp:conf/hicss/KaislerAEM13 fatcat:a4pfro3bi5afzigzbzdls5p76e

The Image of the City out of the Underlying Scaling of City Artifacts or Locations

Bin Jiang
2013 Annals of the Association of American Geographers  
This scaling refers to the fact that, in an imageable city (a city that can easily be imaged in human minds), small city artifacts are far more common than large ones; or alternatively low dense locations  ...  mapping in general, into a quantitative manner.  ...  Acknowledgement The author would like to thank XXX for his constructive comments that significantly help improve the quality of this paper.  ... 
doi:10.1080/00045608.2013.779503 fatcat:gpcpdkxdajfjbd2nucseozw7h4

Crowdsourcing the Unknown: The Satellite Search for Genghis Khan

Albert Yu-Min Lin, Andrew Huynh, Gert Lanckriet, Luke Barrington, Michael D. Petraglia
2014 PLoS ONE  
A increased groundtruthed accuracy was observed in those participants exposed to the peer feedback loop over those whom worked in isolation, suggesting collective reasoning can emerge within networked  ...  Without a pre-existing reference for validation we turn towards consensus, defined by kernel density estimation, to pool human perception for "out of the ordinary" features across a vast landscape.  ...  Independent Reasoning This effort seeks ''a needle in a haystack'', but in a scenario where the appearance of the needle is undefined.  ... 
doi:10.1371/journal.pone.0114046 pmid:25549335 pmcid:PMC4280225 fatcat:mv3whrwm5bgqzesy7d6mh5ciay

Use cases and challenges in telecom big data analytics

Chung-Min Chen
2016 APSIPA Transactions on Signal and Information Processing  
This paper examines the driving forces of big data analytics in the telecom domain and the benefits it offers.  ...  We provide example use cases of big data analytics and the associated challenges, with the hope to inspire new research ideas that can eventually benefit the practice of the telecommunication industry.  ...  From the perspectives of variety, volume, and velocity, telecom operators, like enterprises in all other verticals, face the following two major challenges: • Needle in a haystack -how to uncover correlation  ... 
doi:10.1017/atsip.2016.20 fatcat:vaocc7lozjhtbdbxlln74yvj4a

Identifying AI talents in LinkedIn database, A machine learning approach

Thomas Roca
2019 Zenodo  
Searching for keywords in profiles' sections can lead to mis-identification of certain profiles, especially for those related to a field rather than an occupation.  ...  We suggest this approach allows to avoid manually labeling the training dataset, granted the assumption that job profiles posted by recruiters are more "ideal-typical" or simply provide a more consistent  ...  Machine learning is usually a good approach when confronted to a "Needle in a haystack" type of problem.  ... 
doi:10.5281/zenodo.2649208 fatcat:ygh2pfxaunhnxguhqnvmkie5ou

Identifying AI talents among LinkedIn members, A machine learning approach

Thomas Roca
2019 Zenodo  
Searching for keywords in profiles' sections can lead to mis-identification of certain profiles, especially for those related to a field rather than an occupation.  ...  We suggest this approach allows to avoid manually labeling the training dataset, granted the assumption that job profiles posted by recruiters are more "ideal-typical" or simply provide a more consistent  ...  Machine learning is usually a good approach when confronted to a "Needle in a haystack" type of problem.  ... 
doi:10.5281/zenodo.3240963 fatcat:etngfwtyszh67chl65gnsyee7i

A Mathematical Approach to Healthcare Insurance Data Analytics

Terungwa Simon Yange, Ishaya Peni Gambo, Theresa Omodunbi, Hettie Abimbola Soriyan
2021 UPI YPTK Journal of Computer Scine and Information Technology  
However, the near absence of a mathematical foundation for analytics has become a real challenge amidst the flock of big data marketing activities, especially in healthcare insurance.  ...  A prototype for the model was implemented using Java Programming Language, MapReduce Framework, Association Rule Mining and MongoDB.  ...  The concept of big data analytics involves the collection of large datasets similar to the haystack; there also, exist some small datasets within the given large datasets similar to the needle, existing  ... 
doi:10.35134/jcsitech.v7i4.18 fatcat:yvv3e4mv6jdilfpgcmfktzymeu

Storage and Database Management for Big Data [chapter]

Vijay Gadepally, Jeremy Kepner, Albert Reuther
2016 Big Data  
Acknowledgements The authors wish to thank the LLGrid team at MIT Lincoln Laboratory for their support in setting up the computational environment used to test the performance of Apache Accumulo and SciDB  ...  Databases are designed to pull small chunks of information out (finding a needle in a haystack) and not for sequential access (the forte of distributed filesystems).  ...  For a larger dataset (100s of GB to 10 TB in 2015), the most efficient solution for small requests (less than 5% of the entire dataset) is to use a database.  ... 
doi:10.1201/b19694-4 fatcat:jkayqvvhbzatjnz4kqyi2f5vbq

Semi-supervised teacher-student deep neural network for materials discovery [article]

Daniel Gleaves, Edirisuriya M. Dilanga Siriwardane, Yong Zhao, Nihang Fu, Jianjun Hu
2021 arXiv   pre-print
At the same time, there are a significant amount of unlabelled data available in these databases.  ...  Here we propose a semi-supervised deep neural network (TSDNN) model for high-performance formation energy and synthesizability prediction, which is achieved via its unique teacher-student dual network  ...  This work was supported in part by the South Carolina Honors College Research Program. This work is partially supported by a grant from the University of South Carolina Magellan Scholar Program.  ... 
arXiv:2112.06142v1 fatcat:o7volcq7enb53ka2asv7y5swpm
« Previous Showing results 1 — 15 out of 218 results