Federated pretraining and fine tuning of BERT using clinical notes from multiple silos [article]

Dianbo Liu, Tim Miller
2020 arXiv   pre-print
Large scale contextual representation models, such as BERT, have significantly advanced natural language processing (NLP) in recently years. However, in certain area like healthcare, accessing diverse large scale text data from multiple institutions is extremely challenging due to privacy and regulatory reasons. In this article, we show that it is possible to both pretrain and fine tune BERT models in a federated manner using clinical texts from different silos without moving the data.
arXiv:2002.08562v1 fatcat:n6leku5xdvcenohyik6j4xchqq