An Efficient Horizontal and Vertical Method for Online DNA Sequence Compression

Kamta Nath Mishra, Dr. Anupam Aaggarwal, Dr. Edries Abdelhadi, Dr. Prakash C. Srivastava
2010 International Journal of Computer Applications  
DNA matching has become one of the most used biometric identification method during the last several years. DNA stores the information for creating and organizing an organism. It can be thought of as a string over the alphabets {A, C, G, T, N}, which makes four chemical components that make it up. Here, N represents an unknown nucleotide. This unknown nucleotide may be either A, or C, or G, or T. The size of each sequence is varying in the range of millions to billions of nucleotides.
more » ... n of DNA is interesting for both practical reasons (such as reduced storage and transmission cost) and functional reasons (such as inferring structure and function from compression models). We present a new Lossless Compression algorithm; which compresses data first horizontally and then vertically. It is based on substitution and statistical methods. We claim that our algorithm achieves one of the best compression ratios for bench mark DNA sequences in comparison to other DNA sequence compression methods. General Terms DNA Sequence Compression and Identification
doi:10.5120/757-954 fatcat:mr4hmmniyjbdjoyp2eil2wsk5e