User-friendly algorithms for estimating completeness and diversity in randomized protein-encoding libraries

W. M. Patrick, A. E. Firth, J. M. Blackburn
2003 Protein Engineering Design & Selection  
Directed evolution of proteins depends on the production of molecular diversity by random mutagenesis. While a number of methods have been developed for introducing this diversity, the best ways to sample it are not always clear. Here we used simple statistics to analyse completeness and diversity in randomized libraries generated by oligonucleotide-directed mutagenesis, error-prone polymerase chain reaction (epPCR) and in vitro recombination of highly homologous sequences. For
more » ... irected mutagenesis, we derive equations to estimate how complete a given library is expected to be and also to predict the size of library required to give a ®xed probability of being 100% complete. We describe the statistical bases for computer programs which estimate the number of distinct variants represented in epPCR and shuf¯ed libraries, dubbed PEDEL and DRIVeR, respectively. These programs allow the user to calculate (rather than guess) the diversity represented in a given library and also provide empirical guidelines for maximizing this diversity. PEDEL and DRIVeR are available at www.bio.cam.ac.uk/~blackburn/stats.html.
doi:10.1093/protein/gzg057 pmid:12874379 fatcat:ymk4p5jsqndylcyusbhivgjj6y