Benefiting from Disorder: Source Coding for Unordered Data [article]

Lav R. Varshney, Vivek K. Goyal
2007 arXiv   pre-print
The order of letters is not always relevant in a communication task. This paper discusses the implications of order irrelevance on source coding, presenting results in several major branches of source coding theory: lossless coding, universal lossless coding, rate-distortion, high-rate quantization, and universal lossy coding. The main conclusions demonstrate that there is a significant rate savings when order is irrelevant. In particular, lossless coding of n letters from a finite alphabet
more » ... ires Theta(log n) bits and universal lossless coding requires n + o(n) bits for many countable alphabet sources. However, there are no universal schemes that can drive a strong redundancy measure to zero. Results for lossy coding include distribution-free expressions for the rate savings from order irrelevance in various high-rate quantization schemes. Rate-distortion bounds are given, and it is shown that the analogue of the Shannon lower bound is loose at all finite rates.
arXiv:0708.2310v1 fatcat:lth2kyrzqzdknpbxewhbum627q