Integrating Cross-Linking Experiments with Ab Initio Protein–Protein Docking

Thom Vreven, Devin K. Schweppe, Juan D. Chavez, Chad R. Weisbrod, Sayaka Shibata, Chunxiang Zheng, James E. Bruce, Zhiping Weng
2018 Journal of Molecular Biology  
Ab initio protein-protein docking algorithms often rely on experimental data to identify the most likely complex structure. We integrated protein-protein docking with the experimental data of chemical cross-linking followed by mass spectrometry. We tested our approach using 19 cases that resulted from an exhaustive search of the Protein Data Bank for protein complexes with crosslinks identified in our experiments. We implemented cross-links as constraints based on Euclidean distance or
more » ... me distance. For most test cases the rank of the top-scoring near-native prediction was improved by at least two fold compared with docking without the cross-link information, and the success rate for the top 5 predictions nearly tripled. Our results demonstrate the delicate balance between retaining correct predictions and eliminating false positives. Several test cases had multiple components with distinct interfaces, and we present an approach for assigning cross-links to the interfaces. Employing the symmetry information for these cases further improved the performance of complex structure prediction. prediction [7] [8] [9] [10] [11] [12] [13] [14] [15] . The experimental data can either be used to guide computational prediction [16, 17] or to filter predictions in a post-processing step [11] . In this study, we integrated the ab initio protein-protein docking algorithm ZDOCK [18] [19] [20] [21] with the experimental data of chemical cross-linking followed by mass spectrometry. Crosslinking reagents can form covalent bonds with protein residues that are closer in distance than the length of the linker. Trypsin digestion of cross-linked proteins, followed by mass spectrometry, identifies protein residues that were cross-linked. The cross-linking reagent has a maximum length; therefore, the cross-linking data give an upper bound for the geometric distance between paired residues. Cross-linking data has been used extensively to validate or guide protein-protein docking predictions [11, [22] [23] [24] [25] , and various approaches were developed to integrate the constraints with the docking algorithms [11, [26] [27] [28] . Systematic investigations of the performance using large data sets were, however, carried out only using simulated cross-linking data [27] . Here we present a data set that is derived from our proteome-wide experiments [29] [30] [31] [32] [33] [34] and all use the same linker. The dataset was searched against the known structures in the Protein Data Bank [35] and yielded 19 test cases. Although the resulting collection of test cases is limited in size, it enabled us to compare the effectiveness of several integration schemes and develop a new algorithm for associating the cross-links with specific interfaces in higher-order protein-protein complexes. RESULTS AND DISCUSSION Overall approach We used ZDOCK [18-21] with input component proteins obtained from X-ray crystallography or through homology modeling using X-ray crystallography template structures. The ZDOCK algorithm was integrated with experimental cross-linking data to generate only predictions that satisfy the cross-links. The following three approaches were tested: (1) Filtering the predictions from a standard ZDOCK calculation using the Euclidean distance between cross-linked sites. Although Euclidean distances are fast to compute and therefore applicable to large sets of predictions, the cross-linking distances could be underestimated because the Euclidean path is allowed to pass through protein-occupied space. (2) Filtering the ZDOCK predictions using the Xwalk algorithm [27, 36] to determine the shortest path that is allowed to only pass through protein-unoccupied space (voidvolume). Although physically more accurate than Euclidean distances, computationally the grid-based algorithm is orders of magnitude more expensive to evaluate. (3) Restrict ZDOCK to search only the space that satisfies the Euclidean cross-linking constraints. This approach yields more retained predictions than the filtering methods and therefore may improve performance. We performed cross-linking and mass spectrometry experiments with the lysine-reactive BDP-NHP chemical [30] and then used the ReACT [30] algorithm to identify the crosslinked sites. We used our previously published cross-linking data [25, [29] [30] [31] [32] [33] [34] 37] and unpublished data, and because this data was all generated with the same cross-linking chemical, it allows for a systematic study of the computational algorithms. We then retained only the 2000 heteromeric cross-linked proteins, as most ab initio protein-docking Vreven et al. S1 lists the experimentally detected cross-links. Case 3C had 35 cross-links, and other cases between 1 and 7 with an average of just over 2. Based on the bound structures, the Euclidean distances between the cross-linked sites were under 35 Å and the void-volume distances under 40 Å, except for cases 2C, 2D, and 9, whose cross-linked sites were at slightly greater distances, and case 3C, which showed several much larger distances, which are likely related Vreven et al.
doi:10.1016/j.jmb.2018.04.010 pmid:29665372 pmcid:PMC6084434 fatcat:5ie7meonr5ekvelw4qnlplbgba