Novel methods to optimize gene and statistic test for evaluation – an application for Escherichia coli

Tran Tuan-Anh, Le Thi Ly, Ngo Quoc Viet, Pham The Bao
2017 BMC Bioinformatics  
Since the recombinant protein was discovered, it has become more popular in many aspects of life science. The value of global pharmaceutical market was $87 billion in 2008 and the sales for industrial enzyme exceeded $4 billion in 2012. This is strong evidence showing the great potential of recombinant protein. However, native genes introduced into a host can cause incompatibility of codon usage bias, GC content, repeat region, Shine-Dalgarno sequence with host's expression system, so the
more » ... can fall down significantly. Hence, we propose novel methods for gene optimization based on neural network, Bayesian theory, and Euclidian distance. Result: The correlation coefficients of our neural network are 0.86, 0.73, and 0.90 in training, validation, and testing process. In addition, genes optimized by our methods seem to associate with highly expressed genes and give reasonable codon adaptation index values. Furthermore, genes optimized by the proposed methods are highly matched with the previous experimental data. Conclusion: The proposed methods have high potential for gene optimization and further researches in gene expression. We built a demonstrative program using Matlab R2014a under Mac OS X. The program was published in both standalone executable program and Matlab function files. The developed program can be accessed from
doi:10.1186/s12859-017-1517-z pmid:28187713 pmcid:PMC5303253 fatcat:3rvwf42rvbaodlchwxb42jnjve