A Database Index to Large Biological Sequences

Ela Hunt, Malcolm P. Atkinson, Robert W. Irving
2001 Very Large Data Bases Conference  
We present an approach to searching genetic DNA sequences using an adaptation of the suf- x tree data structure deployed on the general purpose persistent J a va platform, PJama. Our implementation technique is novel, in that it allows us to build su x trees on disk for arbitrarily large sequences, for instance for the longest human chromosome consisting of 263 million letters. We propose to use such indexes as an alternative to the current practice of serial scanning. We describe our tree
more » ... ion algorithm, analyse the performance of our index, and discuss the interplay of the data structure with object store architectures. Early measurements are presented.
dblp:conf/vldb/HuntAI01 fatcat:zx3urj3o45hcfpakr2spefbocy