How to find soluble proteins : a comprehensive analysis of alpha/beta hydrolases for recombinant expression in E. coli [article]

Markus Koschorreck, Markus Fischer, Sandra Barth, Jürgen Pleiss, Universität Stuttgart, Universität Stuttgart
2006
In screening of libraries derived by expression cloning, expression of active proteinsin E. coli can be limited by formation of inclusion bodies. In these cases it would be desirable to enrich gene libraries for coding sequences with soluble gene products in E. coli and thus to improve the efficiency of screening. Previously Wilkinson and Harrison showed that solubility can be predicted from amino acid composition (Biotechnology 1991, 9(5):443-448). We have applied this analysis to members of
more » ... e alpha/beta hydrolase fold family to predict their solubility in E. coli. alpha/beta hydrolases are a highly diverse family with more than 1800 proteins which have been grouped into homologous families and superfamilies. Results: The predicted solubility in E. coli depends on hydrolase size, phylogenetic origin of the host organism, the homologous family and the superfamily, to which the hydrolase belongs. In general small hydrolases are predicted to be more soluble than large hydrolases, and eukaryotic hydrolases are predicted to be less soluble in E. coli than prokaryotic ones. However, combining phylogenetic origin and size leads to more complex conclusions. Hydrolases from prokaryotic, fungal and metazoan origin are predicted to be most soluble if they are of small, medium and large size, respectively. We observed large variations of predicted solubility between hydrolases from different homologous families and from different taxa. Conclusion: A comprehensive analysis of all alpha/beta hydrolase sequences allows more efficient screenings for new soluble alpha/beta hydrolases by the use of libraries which contain more soluble gene products. Screening of hydrolases from families whose members are hard to express as soluble proteins in E. coli should first be done in coding sequences of organisms from phylogenetic groups with the highest average of predicted solubility for proteins of this family. The tools developed here can be used to identify attractive target genes for expression using protein sequences p [...]
doi:10.18419/opus-833 fatcat:3mznsxvsjrairhg55cjxcloyz4