QLZCClust: Quaternary lempel-Ziv complexity based clustering of the RNA-seq read block segments

Ashis Kumer Biswas, Baoju Zhang, Xiaoyong Wu, Jean X. Gao
2013 13th IEEE International Conference on BioInformatics and BioEngineering  
The Next Generation Sequencing platform, RNAseq provides quantitative expression data that exhibit distinctive sequence patterns in the segments of the short-reads level and are found useful in clustering of those segments. However, the result does not reflect the functional chemistry of the noncoding RNAs (ncRNAs). The functions of the ncRNAs are deeply related to their secondary structures. Thus by exploring the clustering in terms of structural profiles of the read block segments rather than
more » ... egments rather than their sequence patterns would be essential and useful. We proposed the QLZCClust (Quaternary Lempel-Ziv complexity based Clustering) method which is an extension to the popular Lempel-Ziv algorithm to compute pairwise secondary structure distance. We applied QLZCClust on the short-read segments obtained from the RNA-seq experient and found that it can separate most miRNAs and the tRNAs. Moreover, it can be used to detect structural similarities among different classes of ncRNAs. We compared our algorithm with the clustering of two other structural distance measures -SimTree edit distance and RNAz based distance, and found that our method performs superior. 978-1-4799-3163-7/13/$31.00 ©2013 IEEE
doi:10.1109/bibe.2013.6701596 dblp:conf/bibe/BiswasZWG13 fatcat:temuz4ojbbhmfhcz73njpqhhne