A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is `application/pdf`

.

## Filters

##
###
Compressing Multisets with Large Alphabets
[article]

2021
*
arXiv
*
pre-print

Current methods that optimally

arXiv:2107.09202v1
fatcat:xpmxlyp2nfbkllsnrhcnbdvupa
*compress**multisets*are not suitable for high-dimensional symbols, as their compute time scales linearly*with**alphabet*size. ...*Compressing*a*multiset*as an ordered sequence*with*off-the-shelf codecs is computationally more efficient, but has a sub-optimal*compression*rate, as bits are wasted encoding the order between symbols. ... Related work To the best of our knowledge, there are no previous works that present a method which is both computationally feasible and rate-optimal for*compressing**multisets*of i.i.d. symbols*with**large*...##
###
Compressing combinatorial objects
[article]

2016
*
arXiv
*
pre-print

However, there are many types of non-sequential data for which good

arXiv:1601.03689v1
fatcat:wcscbo4wezal7bs2szxnro2wkq
*compression*techniques are still*largely*unexplored. ... Near-optimal*compression*methods are described for certain types of permutations, combinations and*multisets*; and the conditions for optimality are made explicit for each method. ...*alphabet*X . ...##
###
Tight Bounds on Profile Redundancy and Distinguishability

2012
*
Neural Information Processing Systems
*

A sufficient statistic for all these properties is the data's profile, the

dblp:conf/nips/AcharyaDO12
fatcat:jn2z7ti44be4jp5vqv6eftfani
*multiset*of the number of times each data element appears. ... In*compression*, it is called redundancy and represents the least additional number of bits over the entropy needed to encode the output of any distribution in P. ... For sufficiently*large*k, this value even exceeds n itself, showing that general distributions over*large**alphabets*cannot be*compressed*or learned at a uniform rate over all*alphabet*sizes, and as the ...##
###
On Universal Coding of Unordered Data

2007
*
2007 Information Theory and Applications Workshop
*

This further implies that finite-

doi:10.1109/ita.2007.4357578
fatcat:cdhuuhdlazewdozxmrwju526a4
*alphabet*memoryless*multisets*cannot be encoded universally*with*vanishing fractional redundancy. ... of finite-*alphabet*memoryless*multisets*. ... Countable*Alphabets*The previous discussion has dealt*with*the entropy of finitealphabet*multisets*, but what about countable*alphabets*? ...##
###
Tight bounds for universal compression of large alphabets

2013
*
2013 IEEE International Symposium on Information Theory
*

., [1] [2] [3] [4] [5] [6] [7] and references therein, have considered universal

doi:10.1109/isit.2013.6620751
dblp:conf/isit/AcharyaDJOS13
fatcat:fpfkmqupn5gi5bqnsn76t5eopm
*compression*of sources over*large**alphabets*, often using patterns to avoid infinite redundancy. ... To address this fast increase in redundancy*with*the*alphabet*size, a new approach was proposed for*compression*and estimation over*large**alphabets*. ... A natural method for*compressing*a sequence over a*large**alphabet*is to*compress*its pattern as well as the dictionary that maps the order to the original symbols. ...##
###
Compressed word problems for inverse monoids
[article]

2011
*
arXiv
*
pre-print

The

arXiv:1106.1000v1
fatcat:kmys7kimafbqlm2morr2g3yfri
*compressed*word problem for a finitely generated monoid M asks whether two given*compressed*words over the generators of M represent the same element of M. ... For string*compression*, straight-line programs, i.e., context-free grammars that generate a single string, are used in this paper. ... In [27] , Margolis and Meakin presented a*large*class of finitely presented inverse monoids*with*decidable word problems. ...##
###
Benefiting from Disorder: Source Coding for Unordered Data
[article]

2007
*
arXiv
*
pre-print

In particular, lossless coding of n letters from a finite

arXiv:0708.2310v1
fatcat:lth2kyrzqzdknpbxewhbum627q
*alphabet*requires Theta(log n) bits and universal lossless coding requires n + o(n) bits for many countable*alphabet*sources. ... ACKNOWLEDGMENTS The authors thank Alon Orlitsky for fruitful discussions; in particular, the results in Section IV-A were developed in collaboration*with*him. The authors also thank Sanjoy K. ...*Large*-Size*Multiset*Asymptotics 1)*Multiset*Mean Squared Error: Assume that the source*alphabet*X is a subset of the real numbers. ...##
###
Super-Linear Indices for Approximate Dictionary Searching
[chapter]

2012
*
Lecture Notes in Computer Science
*

These methods require huge indices whose sizes grow exponentially

doi:10.1007/978-3-642-32153-5_12
fatcat:52enei4um5dp5hd55lstjviywu
*with*respect to the maximum allowable number of errors k. ... One approach to*compress*the full neighborhood is to replace some characters*with*wildcards. Let us extend the*alphabet**with*a wildcard pseudo-character ? that matches any*alphabet*character. ... This method is not efficient for*large*k and/or*large**alphabets*, because the size of the full neighborhood is O n k |Σ| k (where n and |Σ| is the size of the pattern and the*alphabet*, respectively) [21 ...##
###
Compressing multisets using tries

2012
*
2012 IEEE Information Theory Workshop
*

We consider the problem of efficient and lossless representation of a

doi:10.1109/itw.2012.6404756
dblp:conf/itw/GriponRSG12
fatcat:kcviahm3xbg5rkhfmo5od6gtba
*multiset*of m words drawn*with*repetition from a set of size 2 n . ...*with*the same words. ... CONCLUSION We introduced an algorithm (AlgI) to*compress**multisets*of binary words obtained using a Bernoulli 1/2 source. ...##
###
Weisfeiler-Lehman Graph Kernels

2011
*
Journal of machine learning research
*

In this article, we propose a family of efficient kernels for

dblp:journals/jmlr/ShervashidzeSLMB11
fatcat:qj5wpmzbozh65pj6azzoeijumq
*large*graphs*with*discrete node labels. ... Our kernels open the door to*large*-scale applications of graph kernels in various disciplines such as computational biology and social network analysis. ... S. was funded by the DFG project "Kernels for*Large*, Labeled Graphs (LaLa)". ...##
###
Minimax Trees in Linear Time with Applications
[chapter]

2009
*
Lecture Notes in Computer Science
*

Suppose we want to build a good prefix code

doi:10.1007/978-3-642-10217-2_28
fatcat:ljkp7az66zeztmfhwgbjpcwcsa
*with*which to*compress*a file, but are given only a sample of its characters. ... We are still studying*alphabetic*minimax trees and have started studying minimax trees*with*unequal edge costs. ...##
###
Codes in the Space of Multisets—Coding for Permutation Channels With Impairments

2018
*
IEEE Transactions on Information Theory
*

of symbols from a given finite

doi:10.1109/tit.2017.2789292
fatcat:weas33cgczaejnaf4yeoyl2b6m
*alphabet*. ... A general channel model is assumed in which the transmitted*multisets*are potentially impaired by insertions, deletions, substitutions, and erasures of symbols. ... As we have shown, the study of*multiset*codes over a fixed*alphabet*reduces to the study of codes in A m lattices, at least in the*large*block-length limit. ...##
###
Optimal Prefix Free Codes with Partial Sorting

2019
*
Algorithms
*

s deferred data structure to partially sort a

doi:10.3390/a13010012
fatcat:ibk6k7d6o5fbzc4xwhiianc7xa
*multiset*accordingly to the queries performed on it (known since 1988). ... the new analysis technique, such improvement is obtained by combining a new algorithm, inspired by van Leeuwen's algorithm to compute optimal prefix free codes from sorted weights (known since 1976),*with*... of natural languague texts, cited as an example of "*large**alphabet*" application by Moffat [3] , and studied by Moura et al ...##
###
Classification using pattern probability estimators

2010
*
2010 IEEE International Symposium on Information Theory
*

We motivate and propose LRT's based on pattern probability estimators that are known to achieve low redundancy for universal

doi:10.1109/isit.2010.5513570
dblp:conf/isit/AcharyaDOPS10
fatcat:ykj64mo4pnhp3j2omzmdurpjve
*compression*of*large**alphabet*sources. ... We are primarily interested in situations where the*alphabet*of the underlying distributions is*large*compared to the training data available, which is indeed the case in most practical applications. ... In the context of universal*compression*, it was previously shown in [8] that patterns can be*compressed**with*diminishing per symbol redundancy regardless of*alphabet*size of the underlying distribution ...##
###
Estimating multiple concurrent processes

2012
*
2012 IEEE International Symposium on Information Theory Proceedings
*

For Poisson processes, if any estimator approximates the parameter

doi:10.1109/isit.2012.6283551
dblp:conf/isit/AcharyaDJOP12
fatcat:cqezmcvlbbfzfllevbnltkk7y4
*multiset*to within distance*with*error probability δ, then PML approximates the*multiset*to within distance 2*with*error probability at ... For both problems, it is sufficient to consider the observations' profile-the*multiset*of activity counts, regardless of their process identities. ... of*large**alphabet*data sources. ...
« Previous

*Showing results 1 — 15 out of 740 results*