Punctuation Restoration in Spanish Customer Support Transcripts using Transfer Learning

Xiliang Zhu, Shayna Gardiner, David Rossouw, Tere Roldán and Simon Corston-Oliver
2022 Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing   unpublished
Automatic Speech Recognition (ASR) systems typically produce unpunctuated transcripts that have poor readability. In addition, building a punctuation restoration system is challenging for low-resource languages, especially for domain-specific applications. In this paper, we propose a Spanish punctuation restoration system designed for a real-time customer support transcription service. To address the data sparsity of Spanish transcripts in the customer support domain, we introduce two
more » ... arning-based strategies: 1) domain adaptation using out-of-domain Spanish text data; 2) crosslingual transfer learning leveraging in-domain English transcript data. Our experiment results show that these strategies improve the accuracy of the Spanish punctuation restoration system.
doi:10.18653/v1/2022.deeplo-1.9 fatcat:aqvfzub7ifh5zlfagwx7paco7y