Fast protein 3D surface search
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication - ICUIMC '13
Functionally annotating protein structures of unknown function is one of the important challenges in Bioinformatics. An informatics approach to predict the function of a protein is by analyzing the functions of other structurally similar proteins. Ability to search and retrieve similar protein structures among large dataset is crucial in this approach. Here, we propose a novel approach for efficient protein structure search where protein structures are represented as vectors by 3D-Zernike
... ptor (3DZD). Surface shape of protein tertiary structure is compactly represented with 3DZD encoding. This simplified representation accelerates the structural search from daylong to matter of seconds. However, further speed up is required to address the scenarios where multiple users access the database at the same time. We address this need for further speed up in protein structural search by exploiting the fast k nearest neighbor algorithms on the 3DZDs. The results show that the proposed methods significantly improve the searching speed. In addition, we introduce an extended approach for protein structure search based on the methods that utilize the 3DZD characteristic. Experiments show that the searching time reduced 75.41% by the fast k-nearest neighbor algorithm, 88.7% by the extended fast k-nearest neighbor algorithm, 88.84% by the fast threshold-based nearest neighbor algorithm, and 91.53% by the fast extended threshold-based nearest neighbor algorithm. In a simulationed test case, the extended thresholdbased algorithm which had the highest speed improvement in the initial test case, showed speed improvement up to 87.48% compared to linear scan.