Filters








26,982 Hits in 8.5 sec

A Statistically Efficient and Scalable Method for Log-Linear Analysis of High-Dimensional Data

Francois Petitjean, Lloyd Allison, Geoffrey I. Webb
2014 2014 IEEE International Conference on Data Mining  
A good log-linear analysis method requires both high precision and statistical efficiency. High precision means that the risk of false discoveries should be kept very low.  ...  Log-linear analysis is the primary statistical approach to discovering conditional dependencies between the variables of a dataset.  ...  The authors would like to thank Arun Konagurthu, Jilles Vreeken and Ann E. Nicholson for their fruitful comments on this work and for their review of the manuscript.  ... 
doi:10.1109/icdm.2014.23 dblp:conf/icdm/PetitjeanAW14 fatcat:aw525xsk6vaz5m4dqhfsh3qwdm

Spike-and-Slab LASSO Generalized Additive Models and Scalable Algorithms for High-Dimensional Data Analysis [article]

Boyi Guo, Byron C. Jaeger, A.K.M. Fazlur Rahman, D. Leann Long, Nengjun Yi
2022 arXiv   pre-print
A novel two-part spike-and-slab LASSO prior for smooth functions is developed to address the sparsity of signals while providing extra flexibility to select the linear or nonlinear components of smooth  ...  There are proposals that extend the classical generalized additive models (GAMs) to accommodate high-dimensional data (p>>n) using group sparse regularization.  ...  Thirdly, the SSL prior motivates a scalable algorithm, the EM-CD algorithm, for model fitting, and hence is more feasible for high-dimensional data analysis.  ... 
arXiv:2110.14449v3 fatcat:rzbyeuclizay7gpsbi6ra6dmbu

Scaling Log-Linear Analysis to High-Dimensional Data

Francois Petitjean, Geoffrey I. Webb, Ann E. Nicholson
2013 2013 IEEE 13th International Conference on Data Mining  
We develop an efficient approach to log-linear analysis that scales to hundreds of variables by melding the classical statistical machinery of log-linear analysis with advanced data mining techniques from  ...  Association discovery is a fundamental data mining task. The primary statistical approach to association discovery between variables is log-linear analysis.  ...  We believe that we have opened the way for statistically sound discovery of associations between variables in high-dimensional data, and hope that this will prove to be a powerful addition to the data  ... 
doi:10.1109/icdm.2013.17 dblp:conf/icdm/PetitjeanWN13 fatcat:fxjel4jkjjes5boczelydsy4xa

4S: Scalable subspace search scheme overcoming traditional Apriori processing

Hoang Vu Nguyen, Emmanuel Muller, Klemens Bohm
2013 2013 IEEE International Conference on Big Data  
In many real-world applications, data is collected in multi-dimensional spaces. However, not all dimensions are relevant for data analysis.  ...  Existing methods have tried to tackle this by utilizing Apriori search schemes. However, they show poor scalability and miss high quality subspaces.  ...  SCALABLE COMPUTATION OF L 2 Our goal is to have a correlation measure that captures both linear and non-linear correlation.  ... 
doi:10.1109/bigdata.2013.6691596 dblp:conf/bigdataconf/NguyenMB13 fatcat:qnwnta4tcrhbvi6mcweoky4oz4

NCVis: Noise Contrastive Approach for Scalable Visualization [article]

Aleksandr Artemenkov, Maxim Panov
2020 arXiv   pre-print
Modern methods for data visualization via dimensionality reduction, such as t-SNE, usually have performance issues that prohibit their application to large amounts of high-dimensional data.  ...  In this work, we propose NCVis -- a high-performance dimensionality reduction method built on a sound statistical basis of noise contrastive estimation.  ...  The classical linear approaches to dimensionality reduction, such as Principal Component Analysis (PCA; [5] [6] [7] ), are computationally efficient and widely used for data preprocessing and feature  ... 
arXiv:2001.11411v1 fatcat:6a5fkhdf6vgkvgcmw2pvqsxwpe

SCALABLE VISUALIZATION FOR HIGH-DIMENSIONAL SINGLE-CELL DATA

JUHO KIM, NATE RUSSELL, JIAN PENG
2016 Biocomputing 2017  
Here, we present a computational tool that allows efficient visualization of high-dimensional single-cell data onto a low-dimensional (2D or 3D) space while preserving the similarity structure between  ...  State-of-the-art technologies for single-cell analysis have been developed to measure the properties of single-cells and detect hidden information.  ...  Acknowledgments This study was supported by a Sloan Research Fellowship and a National Center for Supercomputing Applications (NCSA) Fellowship of University of Illinois at Urbana-Champaign.  ... 
doi:10.1142/9789813207813_0057 pmid:27897012 fatcat:b3lqureo6bhvfboovupazc2wxq

Linear-time Detection of Non-linear Changes in Massively High Dimensional Time Series

Hoang-Vu Nguyen, Jilles Vreeken
2016 Proceedings of the 2016 SIAM International Conference on Data Mining  
To this end, we propose LIGHT, a linear-time algorithm for robustly detecting non-linear changes in massively high dimensional time series.  ...  Extensive empirical evaluation on both synthetic and real-world data show that LIGHT outperforms state of the art with up to 100% improvement in both quality and efficiency.  ...  Acknowledgements The authors are supported by the Cluster of Excellence "Multimodal Computing and Interaction" within the Excellence Initiative of the German Federal Government.  ... 
doi:10.1137/1.9781611974348.93 dblp:conf/sdm/NguyenV16a fatcat:7eps3y5nw5c35lksvj7kzxcwaa

Linear-time Detection of Non-linear Changes in Massively High Dimensional Time Series [article]

Hoang-Vu Nguyen, Jilles Vreeken
2015 arXiv   pre-print
To this end, we propose LIGHT, a linear-time algorithm for robustly detecting non-linear changes in massively high dimensional time series.  ...  Extensive empirical evaluation on both synthetic and real-world data show that LIGHT outperforms state of the art with up to 100% improvement in both quality and efficiency.  ...  Acknowledgements The authors are supported by the Cluster of Excellence "Multimodal Computing and Interaction" within the Excellence Initiative of the German Federal Government.  ... 
arXiv:1510.08385v1 fatcat:fxslv3fs4rdevkmkgsiy6yg3ie

Regularized Parametric Regression for High-dimensional Survival Analysis

Yan Li, Kevin S. Xu, Chandan K. Reddy
2016 Proceedings of the 2016 SIAM International Conference on Data Mining  
We employ a generalized linear model to approximate the negative log-likelihood and use the elastic net as a sparsity-inducing penalty to effectively deal with highdimensional data.  ...  The presence of incomplete observations due to censoring brings unique challenges in this domain and differentiates survival analysis techniques from other standard regression methods.  ...  Acknowledgments This work was supported in part by the US National Science Foundation grants IIS-1527827 and IIS-1231742.  ... 
doi:10.1137/1.9781611974348.86 dblp:conf/sdm/LiXR16 fatcat:dqbkbmv67beorpbg5f275gaz64

Algorithmic and statistical challenges in modern largescale data analysis are the focus of MMDS 2008

Michael W. Mahoney, LekHeng Lim, Gunnar E. Carlsson
2008 SIGKDD Explorations  
We provide a report for the ACM SIGKDD community about the 2008 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2008), its origin in MMDS 2006, and future directions for this interdisciplinary  ...  MMDS 2008 originally grew out of discussions about our vision for the next-generation of algorithmic, mathematical, and statistical analysis methods for complex large-scale data sets.  ...  The goals of MMDS 2008 were (1) to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets; and (2) to bring together  ... 
doi:10.1145/1540276.1540294 fatcat:tzzasuhsj5eb7bsqxgdglhldbq

Unbiased Multivariate Correlation Analysis

Yisen Wang, Simone Romano, Vinh Nguyen, James Bailey, Xingjun Ma, Shu-Tao Xia
2017 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Correlation measures are a key element of statistics and machine learning, and essential for a wide range of data analysis tasks.  ...  It employs a correction for chance using a statistical model of independence to address the issue of bias.  ...  Acknowledgments This research is supported by the National Natural Science Foundation of China (No. 61371078).  ... 
doi:10.1609/aaai.v31i1.10778 fatcat:qhjvuct4unftlg7plw6xdv7e6q

Algorithmic and Statistical Challenges in Modern Large-Scale Data Analysis are the Focus of MMDS 2008 [article]

Michael W. Mahoney, Lek-Heng Lim, Gunnar E. Carlsson
2008 arXiv   pre-print
The goals of MMDS 2008 were (1) to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets; and (2) to bring together  ...  computer scientists, statisticians, mathematicians, and data analysis practitioners to promote cross-fertilization of ideas.  ...  for providing an interesting perspective on these problems that we incorporated here; and to each  ... 
arXiv:0812.3702v1 fatcat:ihribi3vibb7xhpx7xtpo5nxyq

Learning Sparse Log-Ratios for High-Throughput Sequencing Data [article]

Elliott Gordon-Rodriguez, Thomas P Quinn, John P Cunningham
2021 bioRxiv   pre-print
However, the space of these log-ratios grows combinatorially with the dimension of the input, and as a result, existing learning algorithms do not scale to increasingly common high-dimensional datasets  ...  In the context of high-throughput genetic sequencing data, and Compositional Data more generally, an important class of features are the log-ratios between subsets of the input variables.  ...  basic building blocks for statistical analysis.  ... 
doi:10.1101/2021.02.11.430695 fatcat:hepgni7uabbpnl6so32j5hybfq

Adaptive Randomized Dimension Reduction on Massive Data [article]

Gregory Darnell and Stoyan Georgiev and Sayan Mukherjee and Barbara E Engelhardt
2015 arXiv   pre-print
In this paper we develop an approach for dimension reduction that exploits the assumption of low rank structure in high dimensional data to gain both computational and statistical advantages.  ...  One approach to implementing scalable algorithms is to compress data into a low dimensional latent space using dimension reduction methods.  ...  BEE is pleased to acknowledge support from grants NIH R00 HG006265 and NIH R01 MH101822. SG would like to acknowledge Uwe Ohler, Jonathan Pritchard and Ankan Saha.  ... 
arXiv:1504.03183v1 fatcat:yz6lrheik5ccpgn5kldx4dhqdy

Toward efficient indexing structure for scalable content-based music retrieval

Jialie Shen, Mei Tao, Qiang Qu, Dacheng Tao, Yong Rui
2019 Multimedia Systems  
To support high-quality content-based retrieval over such a large volume of music data, how to develop indexing structure with good effectiveness, efficiency and scalability becomes an important research  ...  including efficiency, scalability and effectiveness.  ...  Basic idea of LDR is to apply linear statistical analysis to map the original high-dimensional features to low-dimensional ones by eliminating the redundant information from the original feature space.  ... 
doi:10.1007/s00530-019-00613-z fatcat:wctz35yq7rhejeihojyblajzpy
« Previous Showing results 1 — 15 out of 26,982 results