Speech Synthesis from ECoG using Densely Connected 3D Convolutional Neural Networks: [article]

Miguel Angrick, Christian Herff, Emily Mugler, Matthew C. Tate, Marc W. Slutzky, Dean J. Krusienski, Tanja Schultz
2018 bioRxiv   pre-print
Objective: Direct synthesis of speech from neural signals could provide a fast and natural way of communication to people with neurological diseases. Invasively-measured brain activity (electrocorticography; ECoG) supplies the necessary temporal and spatial resolution to decode fast and complex processes such as speech production. A number of impressive advances in speech decoding using neural signals have been achieved in recent years, but the complex dynamics are still not fully understood.
more » ... wever, it is unlikely that simple linear models can capture the relation between neural activity and continuous spoken speech. Approach: Here we show that deep neural networks can be used to map ECoG from speech production areas onto an intermediate representation of speech (logMel spectrogram). The proposed method uses a densely connected convolutional neural network topology which is well-suited to work with the small amount of data available from each participant. Main results: In a study with six participants, we achieved correlations up to r=0.69 between the reconstructed and original logMel spectrograms. We transfered our prediction back into an audible waveform by applying a Wavenet vocoder. The vocoder was conditioned on logMel features that harnessed a much larger, pre-existing data corpus to provide the most natural acoustic output. Significance. To the best of our knowledge, this is the first time that high-quality speech has been reconstructed from neural recordings during speech production using deep neural networks.
doi:10.1101/478644 fatcat:5eizj7fu4fc3tm2o4737wubg6a