Finding genes in DNA using decision trees and dynamic programming

S Salzberg, X Chen, J Henderson, K Fasman
1996 Proceedings. International Conference on Intelligent Systems for Molecular Biology  
This study demonstrates the use of decision tree classifiers as the basis for a general gene-finding system. The system uses a dynamic programming algorithm that finds the optimal segmentation of a DNA sequence into coding and non-coding regions (exons and introns). The optimality property is dependent on a separate scoring function that takes a subsequence and assigns to it a score reflecting the probability that the sequence is an exon. In this study, the scoring functions were sets of
more » ... n trees and rules that were combined to give the probability estimate. Experimental results on a newly collected database of human DNA sequences are encouraging, and some new observations about the structure of classifiers for the gene-finding problem have emerged from this study. We also provide descriptions of a new probability chain model that produces very accurate filters to find donor and acceptor sites.
pmid:8877520 fatcat:t5l4b5tu7nazxn3ww3oak7hsoi