Sequence-based prediction of protein binding regions and drug–target interactions
Journal of Cheminformatics
AbstractIdentifying drug–target interactions (DTIs) is important for drug discovery. However, searching all drug–target spaces poses a major bottleneck. Therefore, recently many deep learning models have been proposed to address this problem. However, the developers of these deep learning models have neglected interpretability in model construction, which is closely related to a model's performance. We hypothesized that training a model to predict important regions on a protein sequence would
... crease DTI prediction performance and provide a more interpretable model. Consequently, we constructed a deep learning model, named Highlights on Target Sequences (HoTS), which predicts binding regions (BRs) between a protein sequence and a drug ligand, as well as DTIs between them. To train the model, we collected complexes of protein–ligand interactions and protein sequences of binding sites and pretrained the model to predict BRs for a given protein sequence–ligand pair via object detection employing transformers. After pretraining the BR prediction, we trained the model to predict DTIs from a compound token designed to assign attention to BRs. We confirmed that training the BRs prediction model indeed improved the DTI prediction performance. The proposed HoTS model showed good performance in BR prediction on independent test datasets even though it does not use 3D structure information in its prediction. Furthermore, the HoTS model achieved the best performance in DTI prediction on test datasets. Additional analysis confirmed the appropriate attention for BRs and the importance of transformers in BR and DTI prediction. The source code is available on GitHub (https://github.com/GIST-CSBL/HoTS).