An alphabetic code based atomic level molecular similarity search in databases

Saranya Nallusamy, Samuel Selvaraj
2012 Bioinformation  
Atomic level molecular similarity and diversity studies have gained considerable importance through their wide application in Bioinformatics and Chemo-informatics for drug design. The availability of large volumes of data on chemical compounds requires new methodologies for efficient and effective searching of its archives in less time with optimal computational power. We describe an alphabetic algorithm for similarity searching based on atom-atom bonding preference for ligands. We represented
more » ... 70 cyclindependent kinase 2 inhibitors using strings of pre-defined alphabets for searching using known protein sequence alignment tools. Thus, a common pattern was extracted using this set of compounds for database searching to retrieve similar active compounds. Area under the receiver operating characteristic (ROC) curve was used for the discrimination of similar and dissimilar compounds in the databases. An average retrieval rate of about 60% is obtained in cross-validation using the home-grown dataset and the directory of useful decoys (DUD, formally known as the ZINC database) data. This will help in the effective retrieval of similar compounds using database search. Background: Molecular similarity and diversity studies have gained importance through their wide application in the field of bioinformatics and chemo-informatics [1, 2]. The main goal of structure-based drug design (SBDD) is to find novel lead compounds with potent and specific activity. Based on the principle "similar molecules exert similar activity", ligand similarity searching has gained importance in virtual screening strategy [3, 4]. Ligand similarity can be assessed by means of comparing their structures using 1D, 2D and 3D approaches such as tanimoto coefficient [2, 5], SMILES [6], COMFA [7], COMSIA [8] etc [1, 9, 10]. While, 1D descriptors explain the chemical nomenclature, physicochemical and biological properties, 2D descriptors provide information regarding the fragment counts, topological indices, molecular connectivity and graphical representation and 3D descriptors detail molecular surface, volume and interaction energies. Each descriptor has its own importance in the search of related open access
doi:10.6026/97320630008498 pmid:22829718 pmcid:PMC3398777 fatcat:mxdvu2ha7jcsvmezwvkva4p7de