### Sparse Signal Recovery and Acquisition with Graphical Models

Volkan Cevher, Piotr Indyk, Lawrence Carin, Richard Baraniuk
2010 IEEE Signal Processing Magazine
Many applications in digital signal processing, machine learning, and communications feature a linear regression problem in which unknown data points, hidden variables or codewords are projected into a lower dimensional space via y = Φx + n. (1) In the signal processing context, we refer to x ∈ R N as the signal, y ∈ R M as measurements with M < N , Φ ∈ R M ×N as the measurement matrix, and n ∈ R M as the noise. The measurement matrix Φ is a matrix with random entries in data streaming, an
more » ... omplete dictionary of features in sparse Bayesian learning, or a code matrix in communications [1] [2] [3] . Extracting x from y in (1) is ill-posed in general since M < N and the measurement matrix Φ hence has a nontrivial null space; given any vector v in this null space, x + v defines a solution that produces the same observations y. Additional information is therefore necessary to distinguish the true x among the infinitely many possible solutions [1, 2, 4, 5] . It is now well-known that sparse representations can provide crucial prior information in this dimensionality reduction; we therefore also refer to the problem of determining x in this particular setting as the sparse signal recovery. A signal x has a sparse representation x = Ψα in a basis Ψ ∈ R N ×N when K N coefficients of α can exactly represent or well-approximate the signal x. Inspired by communications, coding and information theory problems, we often refer to the application of Φ on x as encoding of x; the sparse signal recovery problem is then concerned with the decoding of x from y in the presence of noise. In the sequel, we assume the canonical sparsity basis, Ψ = I without loss of generality. The sparse signal recovery problem has been the subject of extensive research over the last few decades in several different research communities, including applied mathematics, statistics, and theoretical computer science [1-3, 6]. The goal of this research has been to obtain higher compression rates; stable recovery schemes; low encoding, update and decoding times; analytical recovery bounds; and resilience to noise. The momentum behind this research is well-justified: underdetermined linear regression problems in tandem with sparse representations underlie the paradigm of signal compression and denoising in signal processing, the tractability and generalization of learning algorithms in machine learning, the stable embedding and decoding properties of codes in information theory, the effectiveness of data streaming algorithms in theoretical computer science, and neuronal information processing and interactions in computational neuroscience. An application du jour of the sparse signal recovery problem is compressive sensing (CS), which integrates the sparse representations with two other key aspects of the linear dimensionality reduction: information preserving projections and tractable recovery algorithms [1, 2, 4-6]. In CS, sparse signals are represented by a union of the N K , K-dimensional subspaces, denoted as x ∈ Σ K . We call the set of indices corresponding to the nonzero entries the support of x. While the matrix Φ is rank deficient, it can be shown to preserve the information in sparse signals if it satisfies the so-called restricted isometry property (RIP). Intriguingly, a large class of random matrices have the RIP with high probability. Today's state-of-the-art CS systems can robustly and provably recover K-sparse signals from just M = O(K log(N/K)) noisy measurements using sparsity-seeking, polynomial-time optimization solvers or greedy algorithms. When x is compressible, in that it can be closely approximated as Ksparse, then from the measurements y, CS can recover a close approximation to x. In this manner we can achieve sub-Nyquist signal acquisition, which requires uniform sampling rates at least two times faster than the signal's Fourier bandwidth to preserve information. While such measurement rates based on sparsity are impressive and have the potential to impact a broad set of streaming, coding, and learning problems, sparsity is merely a first-order description of signal structure; in many applications we have considerably more a priori information that previous approaches to CS fail to exploit. In particular, modern signal, image, and video coders directly exploit the fact that even compressible signal coefficients often sport a strong additional structure in the support of the significant coefficients. For instance, the image compression standard JPEG2000 does not only use the fact that most of the wavelet coefficients of a natural image are small. Rather, it also exploits the fact that the values and locations of the large coefficients have a particular structure that is characteristic of natural images. Coding this structure using an appropriate model enables JPEG2000 and other similar algorithms to compress images close to the maximum amount possible, and significantly better than a naive coder that just assigns bits to each large coefficient independently [1]. By exploiting a priori information on coefficient structure in addition to signal sparsity, we can make CS better, stronger, and faster. The particular approach we will focus in this tutorial is based on graphical models (GM) [3, 7-10]. As we will discover, GMs are not only useful for representing the prior information on x, but also lay the foundations for new kinds of measurement systems. GMs enable