Compressed Suffix Arrays for Massive Data [chapter]

Jouni Sirén
2009 Lecture Notes in Computer Science  
We present a fast space-efficient algorithm for constructing compressed suffix arrays (CSA). The algorithm requires O(n log n) time in the worst case, and only O(n) bits of extra space in addition to the CSA. As the basic step, we describe an algorithm for merging two CSAs. We show that the construction algorithm can be parallelized in a symmetric multiprocessor system, and discuss the possibility of a distributed implementation. We also describe a parallel implementation of the algorithm, capable of indexing several gigabytes per hour.
doi:10.1007/978-3-642-03784-9_7 fatcat:bv2v3wzm3jgmdlowvexkurhcaa