Filters








197 Hits in 2.7 sec

Significance and Recovery of Block Structures in Binary Matrices with Noise [chapter]

Xing Sun, Andrew Nobel
2006 Lecture Notes in Computer Science  
Frequent itemset mining (FIM) is one of the core problems in the field of Data Mining and occupies a central place in its literature.  ...  We begin by establishing several results concerning the extremal behavior of submatrices of ones in a binary matrix with random entries.  ...  [28] assessed the significance of bi-clusters in a real-valued matrix using likelihood-based weights, a normal approximation and a standard Bonferroni bound to account for the multiplicity of submatrices  ... 
doi:10.1007/11776420_11 fatcat:gu4qi73knjhrxdbkoqkc7x6diy

Greedy Search-Binary PSO Hybrid for Biclustering Gene Expression Data

Shyama Das, Sumam Mary Idicula
2010 International Journal of Computer Applications  
As a useful data mining technique biclustering identifies local patterns from gene expression data.  ...  A bicluster of a gene expression dataset is a subset of genes which exhibit similar expression patterns along a subset of conditions.  ...  Moreover clustering happens to partition the genes into disjoint sets i.e. each gene is associated with a single biological function, which in fact is in contradiction to the biological system [1] .  ... 
doi:10.5120/651-908 fatcat:7klls6iavzevvpfd3rsho3gzva

Partitioning a matrix with non-guillotine cuts to minimize the maximum cost

Aristide Mingozzi, Serena Morigi
2002 Discrete Applied Mathematics  
We consider the problem of partitioning a matrix of m rows and n columns of non-negative integers into M smaller submatrices.  ...  With each submatrix is associated a cost equal to the sum of its elements. The objective is to minimize the cost of the submatrix of maximum cost.  ...  The second application deals with the balanced subdivision of a rectangular mining area among M mining companies.  ... 
doi:10.1016/s0166-218x(00)00286-9 fatcat:mxj2avhc2zebfosyyskvh5k7mu

On the maximal size of Large-Average and ANOVA-fit Submatrices in a Gaussian Random Matrix [article]

Xing Sun, Andrew B. Nobel
2010 arXiv   pre-print
We investigate the maximal size of distinguished submatrices of a Gaussian random matrix.  ...  Of interest are submatrices whose entries have average greater than or equal to a positive constant, and submatrices whose entries are well-fit by a two-way ANOVA model.  ...  We would also like to thank John Hartigan for pointing out the use of the Gaussian comparison principle as an alternative way of obtaining the bounds of Proposition 1.  ... 
arXiv:1009.0562v1 fatcat:d6hrdp3reja7rc63ro35sh52ae

Partition of a Binary Matrix intok(k ≥ 3) Exclusive Row and Column Submatrices Is Difficult

Peiqiang Liu, Daming Zhu, Jinjie Xiao, Qingsong Xie, Yanyan Mao
2014 Mathematical Problems in Engineering  
Biclustering in matrices with binary entries ("0"/"1") can be simplified into the problem of finding submatrices with entries of "1."  ...  Biclustering aims at finding a bicluster—a subset of objects that exhibit similar behavior across a subset of attributes, or vice versa.  ...  Moreover, the complexity of some variants of finding bicliques in bipartite graphs is open, for example, the maximum ±1 edge weight biclique problem [15] .  ... 
doi:10.1155/2014/934630 fatcat:s6ok6rn3frdzvoyshpcu77lxx4

Aggregated 2D range queries on clustered points

Nieves R. Brisaboa, Guillermo De Bernardo, Roberto Konow, Gonzalo Navarro, Diego Seco
2016 Information Systems  
Efficient processing of aggregated range queries on two-dimensional grids is a common requirement in information retrieval and data mining systems, for example in Geographic Information Systems and OLAP  ...  Our experimental evaluation shows that this technique can speed up aggregated queries up to more than an order of magnitude, with a small space overhead.  ...  To do this, we traverse the tree as in a top-k range query, but we only output weights whose value is in ½w 1 ; w 2 . Moreover, we discard submatrices whose maximum weight is below w 2 .  ... 
doi:10.1016/j.is.2016.03.004 fatcat:4jrb2sthlbd5znzqmx4uq4ikri

SiBIC: A Tool for Generating a Network of Biclusters Captured by Maximal Frequent Itemset Mining [chapter]

Kei-ichiro Takahashi, David A. duVerle, Sohiya Yotsukura, Ichigaku Takigawa, Hiroshi Mamitsuka
2018 Msphere  
Acknowledgements Part of this research has been supported by MEXT KAKENHI #16H02868 and #17H01783, ACCEL and PRESTO of JST and FiDiPro of Tekes.  ...  Click 'NODES' and then click 'W.DEG' (weighted degree) to sort the table. 3. Click a cell in the row of the node with the maximum weighted degree. 4.  ...  Gene Set Networks To visualize the biclusters, we use gene set networks, each being a weighted graph, where a node corresponds to a coexpressed gene set and an edge indicates the difference of experimental  ... 
doi:10.1007/978-1-4939-8561-6_8 pmid:30030806 fatcat:jcwzay6gffcz3mmdeosessfh7y

On the maximal size of large-average and ANOVA-fit submatrices in a Gaussian random matrix

Xing Sun, Andrew B. Nobel
2013 Bernoulli  
Running title: Maximal submatrices of a Gaussian random matrix Keywords: analysis of variance, data mining, Gaussian random matrix, large average submatrix, random matrix theory, second moment method  ...  We investigate the maximal size of distinguished submatrices of a Gaussian random matrix.  ...  In particular, the vertex set V of G is the disjoint union of two sets V 1 and V 2 , with |V 1 | = m and |V 2 | = n, corresponding to the rows and columns of X, respectively.  ... 
doi:10.3150/11-bej394 pmid:24194673 pmcid:PMC3816128 fatcat:wmzzsjshnzfdxg5q2ywjcbss3a

Mining discrete patterns via binary matrix factorization

Bao-Hong Shen, Shuiwang Ji, Jieping Ye
2009 Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09  
A best approximation on such data has a minimum set of inconsistent entries, i.e., mismatches between the given binary data and the approximate matrix.  ...  Mining discrete patterns in binary data is important for subsampling, compression, and clustering.  ...  ., two disjoint submatrices. By this, two child nodes of the root are constructed.  ... 
doi:10.1145/1557019.1557103 dblp:conf/kdd/ShenJY09 fatcat:hntbxwti7fguvfg2o5ow7rdf4y

Using SVD on Clusters to Improve Precision of Interdocument Similarity Measure

Wen Zhang, Fan Xiao, Bin Li, Siguang Zhang
2016 Computational Intelligence and Neuroscience  
Thirdly, two corpora, a Chinese corpus and an English corpus, are used to evaluate the performances of the proposed methods.  ...  Firstly, we make a survey of existing linear algebra methods for LSI, including both SVD based methods and non-SVD based methods.  ...  Classic weighting schemes [20, 21] are proposed on the basis of information about the frequency distribution of index terms within the whole collection or within the relevant and nonrelevant sets of  ... 
doi:10.1155/2016/1096271 pmid:27579031 pmcid:PMC4992544 fatcat:uhgrhvkr25fk3nvslbadbdtcaq

Homology Computation of Large Point Clouds using Quantum Annealing [article]

Raouf Dridi, Hedayat Alghassi
2016 arXiv   pre-print
In this paper, we present a quantum annealing pipeline for computation of homology of large point clouds. The pipeline takes as input a graph approximating the given point cloud.  ...  It uses quantum annealing to compute a clique covering of the graph and then uses this cover to construct a Mayer-Vietoris complex.  ...  It consists of partitioning the vertex set of G into k non-empty and fixed-sized subsets so that the total weight of edges connecting distinct subsets is minimized.  ... 
arXiv:1512.09328v3 fatcat:pvx6jnwtkvds3lbrv4d6iif5le

Data Ranking and Clustering via Normalized Graph Cut Based on Asymmetric Affinity [chapter]

Olexiy Kyrgyzov, Isabelle Bloch, Yuan Yang, Joe Wiart, Antoine Souloumiac
2013 Lecture Notes in Computer Science  
The first method requires a priori known class labeled data that can be utilized, e.g., for a calibration phase of a braincomputer interface (BCI).  ...  In this paper, we present an extension of the state-of-theart normalized graph cut method based on asymmetry of the affinity matrix.  ...  Generally speaking, nCut is the maximum a posteriori estimation because its value depends on the number of entries in submatrices T L, T R, BL, BR, see Figure 1 .  ... 
doi:10.1007/978-3-642-41184-7_57 fatcat:ynrmzn335ves5n6jaqxkpmaeuy

Finding Biclusters by Random Projections [chapter]

Stefano Lonardi, Wojciech Szpankowski, Qiaofeng Yang
2004 Lecture Notes in Computer Science  
Given a matrix X composed of symbols, a bicluster is a submatrix of X obtained by removing some of the rows and some of the columns of X in such a way that each row of what is left reads the same string  ...  A detailed probabilistic analysis of the algorithm and an asymptotic study of the statistical significance of the solutions are given. We report results of extensive simulations on synthetic data.  ...  This problem has a variety of applications ranging from computational biology to data mining.  ... 
doi:10.1007/978-3-540-27801-6_8 fatcat:3xq7vcfvcnfsnffuusb5khcjc4

Finding biclusters by random projections

Stefano Lonardi, Wojciech Szpankowski, Qiaofeng Yang
2006 Theoretical Computer Science  
Given a matrix X composed of symbols, a bicluster is a submatrix of X obtained by removing some of the rows and some of the columns of X in such a way that each row of what is left reads the same string  ...  A detailed probabilistic analysis of the algorithm and an asymptotic study of the statistical significance of the solutions are given. We report results of extensive simulations on synthetic data.  ...  This problem has a variety of applications ranging from computational biology to data mining.  ... 
doi:10.1016/j.tcs.2006.09.023 fatcat:moai2q3rk5gblodxq6qyglioqu

Robust Calibration for Localization in Clustered Wireless Sensor Networks

Jung Jin Cho, Yu Ding, Yong Chen, Jiong Tang
2010 IEEE Transactions on Automation Science and Engineering  
To use the FAST-LTS, one needs to input a trimming parameter, which is a function of the sensor redundancy in a network.  ...  Applying the robust estimators available from robust statistics research to a wireless sensor network, however, faces a number of computational challenges.  ...  (a) Disconnected clusters. (b) Connected clusters. The design matrix consists of disjoint submatrices.  ... 
doi:10.1109/tase.2009.2013475 fatcat:wjtkmor3mbduhg7mn45j6jk6ja
« Previous Showing results 1 — 15 out of 197 results