740 Hits in 3.2 sec

Compressing Multisets with Large Alphabets [article]

Daniel Severo, James Townsend, Ashish Khisti, Alireza Makhzani, Karen Ullrich
2021 arXiv   pre-print
Current methods that optimally compress multisets are not suitable for high-dimensional symbols, as their compute time scales linearly with alphabet size.  ...  Compressing a multiset as an ordered sequence with off-the-shelf codecs is computationally more efficient, but has a sub-optimal compression rate, as bits are wasted encoding the order between symbols.  ...  Related work To the best of our knowledge, there are no previous works that present a method which is both computationally feasible and rate-optimal for compressing multisets of i.i.d. symbols with large  ... 
arXiv:2107.09202v1 fatcat:xpmxlyp2nfbkllsnrhcnbdvupa

Compressing combinatorial objects [article]

Christian Steinruecken
2016 arXiv   pre-print
However, there are many types of non-sequential data for which good compression techniques are still largely unexplored.  ...  Near-optimal compression methods are described for certain types of permutations, combinations and multisets; and the conditions for optimality are made explicit for each method.  ...  alphabet X .  ... 
arXiv:1601.03689v1 fatcat:wcscbo4wezal7bs2szxnro2wkq

Tight Bounds on Profile Redundancy and Distinguishability

Jayadev Acharya, Hirakendu Das, Alon Orlitsky
2012 Neural Information Processing Systems  
A sufficient statistic for all these properties is the data's profile, the multiset of the number of times each data element appears.  ...  In compression, it is called redundancy and represents the least additional number of bits over the entropy needed to encode the output of any distribution in P.  ...  For sufficiently large k, this value even exceeds n itself, showing that general distributions over large alphabets cannot be compressed or learned at a uniform rate over all alphabet sizes, and as the  ... 
dblp:conf/nips/AcharyaDO12 fatcat:jn2z7ti44be4jp5vqv6eftfani

On Universal Coding of Unordered Data

Lav R. Varshney, Vivek K Goyal
2007 2007 Information Theory and Applications Workshop  
This further implies that finite-alphabet memoryless multisets cannot be encoded universally with vanishing fractional redundancy.  ...  of finite-alphabet memoryless multisets.  ...  Countable Alphabets The previous discussion has dealt with the entropy of finitealphabet multisets, but what about countable alphabets?  ... 
doi:10.1109/ita.2007.4357578 fatcat:cdhuuhdlazewdozxmrwju526a4

Tight bounds for universal compression of large alphabets

Jayadev Acharya, Hirakendu Das, Ashkan Jafarpour, Alon Orlitsky, Ananda Theertha Suresh
2013 2013 IEEE International Symposium on Information Theory  
., [1] [2] [3] [4] [5] [6] [7] and references therein, have considered universal compression of sources over large alphabets, often using patterns to avoid infinite redundancy.  ...  To address this fast increase in redundancy with the alphabet size, a new approach was proposed for compression and estimation over large alphabets.  ...  A natural method for compressing a sequence over a large alphabet is to compress its pattern as well as the dictionary that maps the order to the original symbols.  ... 
doi:10.1109/isit.2013.6620751 dblp:conf/isit/AcharyaDJOS13 fatcat:fpfkmqupn5gi5bqnsn76t5eopm

Compressed word problems for inverse monoids [article]

Markus Lohrey
2011 arXiv   pre-print
The compressed word problem for a finitely generated monoid M asks whether two given compressed words over the generators of M represent the same element of M.  ...  For string compression, straight-line programs, i.e., context-free grammars that generate a single string, are used in this paper.  ...  In [27] , Margolis and Meakin presented a large class of finitely presented inverse monoids with decidable word problems.  ... 
arXiv:1106.1000v1 fatcat:kmys7kimafbqlm2morr2g3yfri

Benefiting from Disorder: Source Coding for Unordered Data [article]

Lav R. Varshney, Vivek K. Goyal
2007 arXiv   pre-print
In particular, lossless coding of n letters from a finite alphabet requires Theta(log n) bits and universal lossless coding requires n + o(n) bits for many countable alphabet sources.  ...  ACKNOWLEDGMENTS The authors thank Alon Orlitsky for fruitful discussions; in particular, the results in Section IV-A were developed in collaboration with him. The authors also thank Sanjoy K.  ...  Large-Size Multiset Asymptotics 1) Multiset Mean Squared Error: Assume that the source alphabet X is a subset of the real numbers.  ... 
arXiv:0708.2310v1 fatcat:lth2kyrzqzdknpbxewhbum627q

Super-Linear Indices for Approximate Dictionary Searching [chapter]

Leonid Boytsov
2012 Lecture Notes in Computer Science  
These methods require huge indices whose sizes grow exponentially with respect to the maximum allowable number of errors k.  ...  One approach to compress the full neighborhood is to replace some characters with wildcards. Let us extend the alphabet with a wildcard pseudo-character ? that matches any alphabet character.  ...  This method is not efficient for large k and/or large alphabets, because the size of the full neighborhood is O n k |Σ| k (where n and |Σ| is the size of the pattern and the alphabet, respectively) [21  ... 
doi:10.1007/978-3-642-32153-5_12 fatcat:52enei4um5dp5hd55lstjviywu

Compressing multisets using tries

Vincent Gripon, Michael Rabbat, Vitaly Skachek, Warren J. Gross
2012 2012 IEEE Information Theory Workshop  
We consider the problem of efficient and lossless representation of a multiset of m words drawn with repetition from a set of size 2 n .  ...  with the same words.  ...  CONCLUSION We introduced an algorithm (AlgI) to compress multisets of binary words obtained using a Bernoulli 1/2 source.  ... 
doi:10.1109/itw.2012.6404756 dblp:conf/itw/GriponRSG12 fatcat:kcviahm3xbg5rkhfmo5od6gtba

Weisfeiler-Lehman Graph Kernels

Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, Karsten M. Borgwardt
2011 Journal of machine learning research  
In this article, we propose a family of efficient kernels for large graphs with discrete node labels.  ...  Our kernels open the door to large-scale applications of graph kernels in various disciplines such as computational biology and social network analysis.  ...  S. was funded by the DFG project "Kernels for Large, Labeled Graphs (LaLa)".  ... 
dblp:journals/jmlr/ShervashidzeSLMB11 fatcat:qj5wpmzbozh65pj6azzoeijumq

Minimax Trees in Linear Time with Applications [chapter]

Paweł Gawrychowski, Travis Gagie
2009 Lecture Notes in Computer Science  
Suppose we want to build a good prefix code with which to compress a file, but are given only a sample of its characters.  ...  We are still studying alphabetic minimax trees and have started studying minimax trees with unequal edge costs.  ... 
doi:10.1007/978-3-642-10217-2_28 fatcat:ljkp7az66zeztmfhwgbjpcwcsa

Codes in the Space of Multisets—Coding for Permutation Channels With Impairments

Mladen Kovacevic, Vincent Y. F. Tan
2018 IEEE Transactions on Information Theory  
of symbols from a given finite alphabet.  ...  A general channel model is assumed in which the transmitted multisets are potentially impaired by insertions, deletions, substitutions, and erasures of symbols.  ...  As we have shown, the study of multiset codes over a fixed alphabet reduces to the study of codes in A m lattices, at least in the large block-length limit.  ... 
doi:10.1109/tit.2017.2789292 fatcat:weas33cgczaejnaf4yeoyl2b6m

Optimal Prefix Free Codes with Partial Sorting

Jérémy Barbay
2019 Algorithms  
s deferred data structure to partially sort a multiset accordingly to the queries performed on it (known since 1988).  ...  the new analysis technique, such improvement is obtained by combining a new algorithm, inspired by van Leeuwen's algorithm to compute optimal prefix free codes from sorted weights (known since 1976), with  ...  of natural languague texts, cited as an example of "large alphabet" application by Moffat [3] , and studied by Moura et al  ... 
doi:10.3390/a13010012 fatcat:ibk6k7d6o5fbzc4xwhiianc7xa

Classification using pattern probability estimators

Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Shengjun Pan, Narayana P. Santhanam
2010 2010 IEEE International Symposium on Information Theory  
We motivate and propose LRT's based on pattern probability estimators that are known to achieve low redundancy for universal compression of large alphabet sources.  ...  We are primarily interested in situations where the alphabet of the underlying distributions is large compared to the training data available, which is indeed the case in most practical applications.  ...  In the context of universal compression, it was previously shown in [8] that patterns can be compressed with diminishing per symbol redundancy regardless of alphabet size of the underlying distribution  ... 
doi:10.1109/isit.2010.5513570 dblp:conf/isit/AcharyaDOPS10 fatcat:ykj64mo4pnhp3j2omzmdurpjve

Estimating multiple concurrent processes

Jayadev Acharya, Hirakendu Das, Ashkan Jafarpour, Alon Orlitsky, Shengjun Pan
2012 2012 IEEE International Symposium on Information Theory Proceedings  
For Poisson processes, if any estimator approximates the parameter multiset to within distance with error probability δ, then PML approximates the multiset to within distance 2 with error probability at  ...  For both problems, it is sufficient to consider the observations' profile-the multiset of activity counts, regardless of their process identities.  ...  of large alphabet data sources.  ... 
doi:10.1109/isit.2012.6283551 dblp:conf/isit/AcharyaDJOP12 fatcat:cqezmcvlbbfzfllevbnltkk7y4
« Previous Showing results 1 — 15 out of 740 results