Vocabulon: a dictionary model approach for reconstruction and localization of transcription factor binding sites

C. Sabatti, L. Rohlin, K. Lange, J. C. Liao
2004 Bioinformatics  
Motivation: Gene expression arrays enable measurements of transcription values for a large number or all genes in the genome. In order to better interpret these resluts and to use them to reconstruct transcription networks, information on location of binding sites for regulatory proteins in the entire genome is needed. In particular, this represents an open problem in Escherichia coli. Results: We describe the first implementation of dictionary-style models to the study of transcription factors
more » ... binding sites in an entire genome. Vocabulon's unique feature is that it can both reconstruct binding sites characterized by unknown motifs and impute locations of known binding sites in long sequences by simultaneous search. On one hand, the dictionary model specifies a probability for the entire sequence taking simultaneously into account all the possible binding sites. This greatly reduces the number of false positives. On the other hand, the possibility of refining motif description, as an increasig number of binding sites are identified, augments the sensitivity of the method. We illustrate these properties with examples in E.coli. The results of gene expression arrays are used both to guide the search and corroborate it.
doi:10.1093/bioinformatics/bti083 pmid:15509602 fatcat:hag36no2brdedn5dgwwrniz32q