Competitive Group Testing and Learning Hidden Vertex Covers with Minimum Adaptivity [chapter]

Peter Damaschke, Azam Sheikh Muhammad
2009 Lecture Notes in Computer Science  
Suppose that we are given a set of n elements d of which have a property called defective. A group test can check for any subset, called a pool, whether it contains a defective. It is known that a nearly optimal number of O(d log(n/d)) pools in 2 stages (where tests within a stage are done in parallel) are sufficient, but then the searcher must know d in advance. Here we explore group testing strategies that use a nearly optimal number of pools and a few stages although d is not known
more » ... . We prove a lower bound of Ω(log d/ log log d) stages and a more general pools vs. stages tradeoff. This is almost tight, since O(log d) stages are sufficient for a strategy with O(d log n) pools. As opposed to this negative result, we devise a randomized strategy using O(d log(n/d)) pools in 3 stages, with any desired success probability 1 − . With some additional measures even 2 stages are enough. Open questions concern the optimal constant factors and practical implications. A related problem motivated by, e.g., biological network analysis is to learn hidden vertex covers of a small size k in unknown graphs by edge group tests. (Does a given subset of vertices contain an edge?) We give a 1-stage strategy using O(k 3 log n) pools, with any parameterized algorithm for vertex cover enumeration as a decoder. During the course of this work we also provide a classification of types of randomized search strategies in general. 84-95. to a group test. A negative pool is a pool without defectives, thus responding No to a group test. Group testing has several applications, most notably in biological and chemical testing, but also in communication networks, information gathering, compression, streaming algorithms, etc., see for instance [9, 10, 15, 20, 21, 22] and further pointers therein. Throughout this paper, log means log 2 if no other base is mentioned. By the informationtheoretic lower bound, at least log n d ≈ d log(n/d) pools are needed to find d defectives even if the number d is known in advance, and it is an easy exercise to devise an adaptive query strategy using O(d log(n/d)) pools. Here, a strategy is called adaptive if queries are asked sequentially, that is, every pool can be prepared based on the outcomes of all earlier queries. For many applications however, the time consumption of adaptive strategies is hardly acceptable, and strategies that work in a few stages are strongly preferred: The pools for every stage must be prepared in advance, depending on the outcomes of earlier stages, and then they are queried in parallel. Any 1-stage strategy needs Ω(d 2 log n/ log d) pools, as a consequence of the lower bound ford-separable matrices [6] which are pooling designs that can distinguish between any two possible sets of at most d defectives. The same lower bound was already well known for d-disjunct matrices, i.e., speciald-separable matrices that also allow very simple decoding of the test results [18, 24] . On the other hand, O(d 2 log n) pools are sufficient. The currently best factor is 4.28; see [8] and the references therein. The first 2-stage strategy using a number of pools within a constant factor of optimum, more precisely 7.54 d log(n/d), was developed in [14] and later improved to essentially 4 d log(n/d) [19] and finally 1.9 d log(n/d), or even 1.44 d log(n/d) for large enough d [8]. These strategies use stage 1 to find O(d) candidate elements including all defectives, which are then tested individually in stage 2. Such 2-stage strategies are called "trivial" (a misunderstandable but established term); they are of independent interest as an intermediate type of strategies between 1-stage and "truly 2-stage" strategies. Note that a trivial 2-stage strategy requires no subsequent decoding. The 2-stage strategies still require the knowledge of an upper bound d on the number of defectives, and they guarantee an almost optimal query complexity only relative to this d which can be much larger than the true number of defectives in the particular case. As opposed to this, adaptive strategies with O(d log(n/d)) pools do not need any prior knowledge of d. Beginning with [3, 16, 17] , substantial work has been done to minimize the constant factor in O(d log(n/d)), called the competitive ratio. The currently best results are in [25] . Our problem with unknown d was also raised in [22] , and several batching strategies have been proposed and studied experimentally. To our best knowledge, the present work is the first to establish rigorous results for this question: Can we take the best of two worlds and perform group testing without prior knowledge of d in a few stages, using a number of pools close to the information-theoretic lower bound? This question is not only of theoretical interest. If the number d of defectives varies a lot between the problem instances, then the conservative policy of assuming some "large enough" d systematically requires unnecessarily many tests, while a strategy with underestimated d even fails to find all defectives. It is fairly obvious that a 1-stage strategy cannot do better than n individual tests. On the bright side, O(log d) stages are sufficient to accommodate a strategy with O(d log(n/d)) pools: Simply double the assumed d in every other stage, and apply the best 2-stage strategy repeatedly, including a check if all defectives have been found. In this paper we prove that
doi:10.1007/978-3-642-03409-1_9 fatcat:z627dt4foncctmisjtyk7nylhy