Semi-supervised segmentation and genome annotation [article]

Rachel C.W. Chan, Matthew McNeil, Eric G. Roberts, Mickaël Mendez, Maxwell W. Libbrecht, Michael M. Hoffman
2020 bioRxiv   pre-print
Segmentation and genome annotation methods automatically discover joint signal patterns in whole genome datasets. Previously, researchers trained these algorithms in a fully unsupervised way, with no prior knowledge of the functions of particular regions. Adding information provided by expert-created annotations to supervise training could improve the annotations created by these methods. We implemented semi-supervised learning using virtual evidence in the annotation method Segway.
more » ... Segway. Additionally, we defined a positionally tolerant precision and recall metric for scoring genome annotations based on the proximity of each annotation feature to the truth set. We demonstrate semi-supervised Segway's ability to learn patterns corresponding to provided transcription start sites on a specified supervision label, and subsequently recover other transcription start sites in unseen data on the same supervision label.
doi:10.1101/2020.01.30.926923 fatcat:h6iymyr2tzeafdyqivzi6hgvfe