Multi-modal Protein Knowledge Graph Construction and Applications [article]

Siyuan Cheng, Xiaozhuan Liang, Zhen Bi, Huajun Chen, Ningyu Zhang
2022 arXiv   pre-print
Existing data-centric methods for protein science generally cannot sufficiently capture and leverage biology knowledge, which may be crucial for many protein tasks. To facilitate research in this field, we create ProteinKG65, a knowledge graph for protein science. Using gene ontology and Uniprot knowledge base as a basis, we transform and integrate various kinds of knowledge with aligned descriptions and protein sequences, respectively, to GO terms and protein entities. ProteinKG65 is mainly
more » ... icated to providing a specialized protein knowledge graph, bringing the knowledge of Gene Ontology to protein function and structure prediction. The current version contains about 614,099 entities, 5,620,437 triples (including 5,510,437 protein-go triplets and 110,000 GO-GO triplets). We also illustrate the potential applications of ProteinKG65 with a prototype. Our dataset can be downloaded at
arXiv:2207.10080v2 fatcat:fwwvqkps5zfcxiwxyj5s2si6fi