Corpus CEFALA-1: Base de dados audiovisual de locutores para estudos de biometria, fonética e fonologia / Corpus CEFALA-1: Audiovisual Database of Speakers for Biometric, Phonetic and Phonology Studies

Arlindo Follador Neto, Adelino Pinheiro Silva, Hani Camille Yehia
2019 Revista de Estudos da Linguagem  
Este trabalho apresenta as metodologias de processamento, segmentação e organização às quais as amostras de fala foram submetidas, além de análises estatísticas, aplicação à verificação biométrica e análises fonético-fonológicas preliminares do corpus. Palavras-chave: corpus de locutores; biometria; fonética e fonologia; base de dados audiovisual. Abstract: Human speech has been studied in different areas of knowledge, which range from biometry to phonetics and phonology. In research conducted
more » ... n such areas, speech samples are necessary resources for obtaining results and validating hypotheses. For this, samples of different speakers and contents are stored in audio files and organized into databases. Such databases allow the continuity, practicality and reliability of studies, eliminating the difficult and time consuming step of data collection. Moreover, they allow consistent comparisons between different studies. However, free access databases in the Portuguese language or recorded in controlled environments are rarely found. The objective of this paper is to construct a free and public database of Brazilian Portuguese, named Corpus CEFALA-1. The database comprises 104 speakers guided by a specific protocol for the collection of audiovisual speech samples recorded in a studio. The paper presents the methodologies for processing, segmentation and organization of speech samples, statistical analysis, application to biometric verification and preliminary phonetic-phonological analyses.
doi:10.17851/2237-2083.27.1.191-212 fatcat:ckcr4nvs2nfelbi6uxsfcblclq