A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Compressing Multisets with Large Alphabets
[article]
2021
arXiv
pre-print
Current methods that optimally compress multisets are not suitable for high-dimensional symbols, as their compute time scales linearly with alphabet size. ...
Compressing a multiset as an ordered sequence with off-the-shelf codecs is computationally more efficient, but has a sub-optimal compression rate, as bits are wasted encoding the order between symbols. ...
Related work To the best of our knowledge, there are no previous works that present a method which is both computationally feasible and rate-optimal for compressing multisets of i.i.d. symbols with large ...
arXiv:2107.09202v1
fatcat:xpmxlyp2nfbkllsnrhcnbdvupa
Compressing combinatorial objects
[article]
2016
arXiv
pre-print
However, there are many types of non-sequential data for which good compression techniques are still largely unexplored. ...
Near-optimal compression methods are described for certain types of permutations, combinations and multisets; and the conditions for optimality are made explicit for each method. ...
alphabet X . ...
arXiv:1601.03689v1
fatcat:wcscbo4wezal7bs2szxnro2wkq
Tight Bounds on Profile Redundancy and Distinguishability
2012
Neural Information Processing Systems
A sufficient statistic for all these properties is the data's profile, the multiset of the number of times each data element appears. ...
In compression, it is called redundancy and represents the least additional number of bits over the entropy needed to encode the output of any distribution in P. ...
For sufficiently large k, this value even exceeds n itself, showing that general distributions over large alphabets cannot be compressed or learned at a uniform rate over all alphabet sizes, and as the ...
dblp:conf/nips/AcharyaDO12
fatcat:jn2z7ti44be4jp5vqv6eftfani
On Universal Coding of Unordered Data
2007
2007 Information Theory and Applications Workshop
This further implies that finite-alphabet memoryless multisets cannot be encoded universally with vanishing fractional redundancy. ...
of finite-alphabet memoryless multisets. ...
Countable Alphabets The previous discussion has dealt with the entropy of finitealphabet multisets, but what about countable alphabets? ...
doi:10.1109/ita.2007.4357578
fatcat:cdhuuhdlazewdozxmrwju526a4
Tight bounds for universal compression of large alphabets
2013
2013 IEEE International Symposium on Information Theory
., [1] [2] [3] [4] [5] [6] [7] and references therein, have considered universal compression of sources over large alphabets, often using patterns to avoid infinite redundancy. ...
To address this fast increase in redundancy with the alphabet size, a new approach was proposed for compression and estimation over large alphabets. ...
A natural method for compressing a sequence over a large alphabet is to compress its pattern as well as the dictionary that maps the order to the original symbols. ...
doi:10.1109/isit.2013.6620751
dblp:conf/isit/AcharyaDJOS13
fatcat:fpfkmqupn5gi5bqnsn76t5eopm
Compressed word problems for inverse monoids
[article]
2011
arXiv
pre-print
The compressed word problem for a finitely generated monoid M asks whether two given compressed words over the generators of M represent the same element of M. ...
For string compression, straight-line programs, i.e., context-free grammars that generate a single string, are used in this paper. ...
In [27] , Margolis and Meakin presented a large class of finitely presented inverse monoids with decidable word problems. ...
arXiv:1106.1000v1
fatcat:kmys7kimafbqlm2morr2g3yfri
Benefiting from Disorder: Source Coding for Unordered Data
[article]
2007
arXiv
pre-print
In particular, lossless coding of n letters from a finite alphabet requires Theta(log n) bits and universal lossless coding requires n + o(n) bits for many countable alphabet sources. ...
ACKNOWLEDGMENTS The authors thank Alon Orlitsky for fruitful discussions; in particular, the results in Section IV-A were developed in collaboration with him. The authors also thank Sanjoy K. ...
Large-Size Multiset Asymptotics 1) Multiset Mean Squared Error: Assume that the source alphabet X is a subset of the real numbers. ...
arXiv:0708.2310v1
fatcat:lth2kyrzqzdknpbxewhbum627q
Super-Linear Indices for Approximate Dictionary Searching
[chapter]
2012
Lecture Notes in Computer Science
These methods require huge indices whose sizes grow exponentially with respect to the maximum allowable number of errors k. ...
One approach to compress the full neighborhood is to replace some characters with wildcards. Let us extend the alphabet with a wildcard pseudo-character ? that matches any alphabet character. ...
This method is not efficient for large k and/or large alphabets, because the size of the full neighborhood is O n k |Σ| k (where n and |Σ| is the size of the pattern and the alphabet, respectively) [21 ...
doi:10.1007/978-3-642-32153-5_12
fatcat:52enei4um5dp5hd55lstjviywu
Compressing multisets using tries
2012
2012 IEEE Information Theory Workshop
We consider the problem of efficient and lossless representation of a multiset of m words drawn with repetition from a set of size 2 n . ...
with the same words. ...
CONCLUSION We introduced an algorithm (AlgI) to compress multisets of binary words obtained using a Bernoulli 1/2 source. ...
doi:10.1109/itw.2012.6404756
dblp:conf/itw/GriponRSG12
fatcat:kcviahm3xbg5rkhfmo5od6gtba
Weisfeiler-Lehman Graph Kernels
2011
Journal of machine learning research
In this article, we propose a family of efficient kernels for large graphs with discrete node labels. ...
Our kernels open the door to large-scale applications of graph kernels in various disciplines such as computational biology and social network analysis. ...
S. was funded by the DFG project "Kernels for Large, Labeled Graphs (LaLa)". ...
dblp:journals/jmlr/ShervashidzeSLMB11
fatcat:qj5wpmzbozh65pj6azzoeijumq
Minimax Trees in Linear Time with Applications
[chapter]
2009
Lecture Notes in Computer Science
Suppose we want to build a good prefix code with which to compress a file, but are given only a sample of its characters. ...
We are still studying alphabetic minimax trees and have started studying minimax trees with unequal edge costs. ...
doi:10.1007/978-3-642-10217-2_28
fatcat:ljkp7az66zeztmfhwgbjpcwcsa
Codes in the Space of Multisets—Coding for Permutation Channels With Impairments
2018
IEEE Transactions on Information Theory
of symbols from a given finite alphabet. ...
A general channel model is assumed in which the transmitted multisets are potentially impaired by insertions, deletions, substitutions, and erasures of symbols. ...
As we have shown, the study of multiset codes over a fixed alphabet reduces to the study of codes in A m lattices, at least in the large block-length limit. ...
doi:10.1109/tit.2017.2789292
fatcat:weas33cgczaejnaf4yeoyl2b6m
Optimal Prefix Free Codes with Partial Sorting
2019
Algorithms
s deferred data structure to partially sort a multiset accordingly to the queries performed on it (known since 1988). ...
the new analysis technique, such improvement is obtained by combining a new algorithm, inspired by van Leeuwen's algorithm to compute optimal prefix free codes from sorted weights (known since 1976), with ...
of natural languague texts, cited as an example of "large alphabet" application by Moffat [3] , and studied by Moura et al ...
doi:10.3390/a13010012
fatcat:ibk6k7d6o5fbzc4xwhiianc7xa
Classification using pattern probability estimators
2010
2010 IEEE International Symposium on Information Theory
We motivate and propose LRT's based on pattern probability estimators that are known to achieve low redundancy for universal compression of large alphabet sources. ...
We are primarily interested in situations where the alphabet of the underlying distributions is large compared to the training data available, which is indeed the case in most practical applications. ...
In the context of universal compression, it was previously shown in [8] that patterns can be compressed with diminishing per symbol redundancy regardless of alphabet size of the underlying distribution ...
doi:10.1109/isit.2010.5513570
dblp:conf/isit/AcharyaDOPS10
fatcat:ykj64mo4pnhp3j2omzmdurpjve
Estimating multiple concurrent processes
2012
2012 IEEE International Symposium on Information Theory Proceedings
For Poisson processes, if any estimator approximates the parameter multiset to within distance with error probability δ, then PML approximates the multiset to within distance 2 with error probability at ...
For both problems, it is sufficient to consider the observations' profile-the multiset of activity counts, regardless of their process identities. ...
of large alphabet data sources. ...
doi:10.1109/isit.2012.6283551
dblp:conf/isit/AcharyaDJOP12
fatcat:cqezmcvlbbfzfllevbnltkk7y4
« Previous
Showing results 1 — 15 out of 740 results