A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences
2021
Proceedings of the National Academy of Sciences of the United States of America
In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In the life sciences, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Protein language modeling at the scale of evolution is a logical step toward predictive and generative artificial intelligence for biology. To this end, we use unsupervised
doi:10.1073/pnas.2016239118
pmid:33876751
pmcid:PMC8053943
fatcat:3bdyww5e75csbazdbf3ur4vcfu