Filters








1,962 Hits in 6.7 sec

Kernel Convolution Model for Decoding Sounds from Time-Varying Neural Responses

Ali Faisal, Anni Nora, Jaeho Seol, Hanna Renvall, Riitta Salmelin
2015 2015 International Workshop on Pattern Recognition in NeuroImaging  
In this study we present a kernel based convolution model to characterize neural responses to natural sounds by decoding their time-varying acoustic features.  ...  Convolution models typically decode frequencies that appear at a certain time point in the sound signal by using neural responses from that time point until a certain fixed duration of the response.  ...  We thank Elia Formisano, Tom Mitchell, Giancarlo Valente, and Gustavo Sudre for valuable discussions.  ... 
doi:10.1109/prni.2015.10 dblp:conf/prni/FaisalNSRS15 fatcat:temdokmekfdcnkrwdstsnldhui

Dynamic time-locking mechanism in the cortical representation of spoken words

A. Nora, A. Faisal, J. Seol, H. Renvall, E. Formisano, R. Salmelin
2020 eNeuro  
We aimed at determining how well the models, differing in their representation of temporal information, serve to decode and reconstruct spoken words from MEG recordings in 16 healthy volunteers.  ...  In this study, computational modeling of cortical responses to spoken words highlights the relevance of temporal tracking of spectrotemporal features especially for speech.  ...  sound feature analysis, Tiina Lindh-Knuutila for assistance with the corpus vectors, and Sasa Kivisaari for comments on this manuscript.  ... 
doi:10.1523/eneuro.0475-19.2020 pmid:32513662 pmcid:PMC7470935 fatcat:sntut6ihgfd2tkmmoyozrrfkya

Dynamic time-locking mechanism in the cortical representation of spoken words [article]

Anni Nora, Ali Faisal, Jaeho Seol, Hanna Renvall, Elia Formisano, Riitta Salmelin
2019 biorxiv/medrxiv   pre-print
We aimed at determining how well the models, differing in their representation of temporal information, serve to decode and reconstruct spoken words from MEG recordings in 16 healthy volunteers.  ...  We discovered that time-locking of the cortical activation to the unfolding speech input is crucial for the encoding of the acoustic-phonetic features.  ...  sound feature analysis, Tiina Lindh-Knuutila for assistance with the corpus vectors, and Sasa Kivisaari for comments on this manuscript.  ... 
doi:10.1101/730838 fatcat:k66ptrjhpbba7dmityrtkhmxka

Towards efficient models for real-time deep noise suppression [article]

Sebastian Braun, Hannes Gamper, Chandan K.A. Reddy, Ivan Tashev
2021 arXiv   pre-print
With recent research advancements, deep learning models are becoming attractive and powerful choices for speech enhancement in real-time applications.  ...  during inference time.  ...  As shown in Fig. 2 , the structure has L symmetric convolutional and deconvolutional encoder and decoder layers with kernels of size (2, 3) in time and frequency dimensions.  ... 
arXiv:2101.09249v2 fatcat:e26mygy4vfhc3jznpktdg53f3m

Traffic Data Imputation Using Deep Convolutional Neural Networks

Ouafa Benkraouda, Bilal Thonnam Thodi, Hwasoo Yeo, Monica Menendez, Saif Eddin Jabari
2020 IEEE Access  
Using a convolutional encoder-decoder based architecture, we show that a well trained neural network can learn spatio-temporal traffic speed dynamics from time-space diagrams.  ...  We demonstrate this for a homogeneous road section using simulated vehicle trajectories and then validate it using real-world data from the Next Generation Simulation (NGSIM) program.  ...  We proposed a convolutional encoder-decoder neural network model to learn traffic speed dynamics from space-time diagrams.  ... 
doi:10.1109/access.2020.2999662 fatcat:6zbg7yfkhbdg5gfb6ewti7ykaa

Deep Neural Networks and End-to-End Learning for Audio Compression [article]

Daniela N. Rim, Inseon Jang, Heeyoul Choi
2021 arXiv   pre-print
In addition, our approach allows the separation of the encoder and decoder, which is necessary for compression tasks.  ...  To our best knowledge, this is the first end-to-end learning for a single audio compression model with RNNs, and our model achieves a Signal to Distortion Ratio (SDR) of 20.54.  ...  Several combinations of stride and kernel sizes were tried. We also varied the number of filters.  ... 
arXiv:2105.11681v2 fatcat:dq2epuvqpvbtbfv3jrd5ckup64

SkipConvNet: Skip Convolutional Neural Network for Speech Dereverberation using Optimally Smoothed Spectral Mapping [article]

Vinay Kothapally, Wei Xia, Shahram Ghorbani, John H.L. Hansen, Wei Xue, Jing Huang
2020 arXiv   pre-print
In this study, we propose 'SkipConvNet' where we replace each skip connection with multiple convolutional modules to provide decoder with intuitive feature maps rather than encoder's output to improve  ...  One of the most popular variants of these FCNs is the 'U-Net', which is an encoder-decoder network with skip connections.  ...  Unlike [14] , which uses convolutions with varying kernels in parallel, we use standard convolutions followed by normalization and non-linear activation for the encoder and the decoder respectively.  ... 
arXiv:2007.09131v1 fatcat:kts3iwehjbaizcwxz3xo6ydrle

SkipConvNet: Skip Convolutional Neural Network for Speech Dereverberation Using Optimally Smoothed Spectral Mapping

Vinay Kothapally, Wei Xia, Shahram Ghorbani, John H.L. Hansen, Wei Xue, Jing Huang
2020 Interspeech 2020  
from the source, h(t) is the room impulse response (RIR), and n(t) is background additive noise.  ...  In this study, we propose 'SkipConvNet' where we replace each skip connection with multiple convolutional modules to provide decoder with intuitive feature maps rather than encoder's output to improve  ...  Unlike [19] , which uses convolutions with varying kernels in parallel, we use standard convolutions followed by normalization and non-linear activation for the encoder and the decoder respectively.  ... 
doi:10.21437/interspeech.2020-2048 dblp:conf/interspeech/KothapallyXGHX020 fatcat:gfje7i2g45cwhbnix5ihc7iixi

Neural Decoding of Inferior Colliculus Multiunit Activity for Sound Category identification with temporal correlation and deep learning [article]

Fatma Ozcan, Ahmet Alkan
2022 bioRxiv   pre-print
Using pre-trained convolutional neural networks (CNNs), features of the images were extracted and the type of sound heard was classified.  ...  It is thought that the time-frequency correlation characteristics of sounds may be reflected in auditory assembly responses in the midbrain and that this may play an important role in identification of  ...  Faisal et al, in their study, proposed a kernel convolution model to characterise neural responses to natural sounds.  ... 
doi:10.1101/2022.08.24.505211 fatcat:77kgi7rn2zdynmqjlstwpfnssi

Multi-Channel Speech Enhancement using Graph Neural Networks

Panagiotis Tzirakis, Anurag Kumar, Jacob Donley
2021 arXiv   pre-print
Recently proposed methods tackle this problem by incorporating deep neural network models with spatial filtering techniques such as the minimum variance distortionless response (MVDR) beamformer.  ...  Multi-channel speech enhancement aims to extract clean speech from a noisy mixture using signals captured from multiple microphones.  ...  All (de-)convolution of the encoder (decoder) use 3 × 3 kernel size, with stride 2 × 2, and no padding.  ... 
arXiv:2102.06934v1 fatcat:gbplu5a7hbh5pbzlfazgk7vyhe

A hierarchical sparse coding model predicts acoustic feature encoding in both auditory midbrain and cortex

Qingtian Zhang, Xiaolin Hu, Bo Hong, Bo Zhang, Frédéric E. Theunissen
2019 PLoS Computational Biology  
hierarchical sparse coding model for the auditory pathway PLOS Computational Biology | https://doi.org/10.  ...  The auditory pathway consists of multiple stages, from the cochlear nucleus to the auditory cortex.  ...  We used linear regression to decode these features from the response amplitudes of model units (Materials and Methods).  ... 
doi:10.1371/journal.pcbi.1006766 fatcat:22xw7qbwlnapdbpcgi4cyakrky

Deep Dense and Convolutional Autoencoders for Unsupervised Anomaly Detection in Machine Condition Sounds [article]

Alexandrine Ribeiro, Luis Miguel Matos, Pedro Jose Pereira, Eduardo C. Nunes, Andre L. Ferreira, Paulo Cortez, Andre Pilastri
2020 arXiv   pre-print
The two methods involve deep autoencoders, based on dense and convolutional architectures that use melspectogram processed sound features.  ...  This technical report describes two methods that were developed for Task 2 of the DCASE 2020 challenge.  ...  The AE models are based on Dense and Convolutional Neural Networks (CNN).  ... 
arXiv:2006.10417v2 fatcat:clqtupdyh5g35am6sqzttagqxu

DDSP: Differentiable Digital Signal Processing [article]

Jesse Engel, Lamtharn Hantrakul, Chenjie Gu, Adam Roberts
2020 arXiv   pre-print
Most generative models of audio directly generate samples in one of two domains: time or frequency.  ...  without losing the expressive power of neural networks.  ...  Convolution via matrix multiplication scales as O(n 3 ), which is intractable for such large kernel sizes.  ... 
arXiv:2001.04643v1 fatcat:igkkrj7wlncnbnpbdnxvyrbmji

Similarity measures for vocal-based drum sample retrieval using deep convolutional auto-encoders [article]

Adib Mehrabi, Keunwoo Choi, Simon Dixon, Mark Sandler
2018 arXiv   pre-print
We use a linear mixed effect regression model to show how features learned by convolutional auto-encoders (CAEs) perform as predictors for perceptual similarity between sounds.  ...  Meanwhile, deep learning methods have demonstrated state-of-the-art results for matching vocal imitations to imitated sounds, yet little is known about how well learned features represent the perceptual  ...  ACKNOWLEDGEMENTS This work is supported by EPSRC grants for the Media and Arts Technology Doctoral Training Centre (EP/G03723X/1) and FAST IMPACt project (EP/L019981/1).  ... 
arXiv:1802.05178v1 fatcat:hnv3o2spsfauzbcc6itk72mhyy

Decoding speech from non-invasive brain recordings [article]

Alexandre Défossez, Charlotte Caucheteux, Jérémy Rapin, Ori Kabeli, Jean-Rémi King
2022 arXiv   pre-print
19.1% out of 2,604 segments for EEG recordings -- hence allowing the decoding of phrases absent from the training set.  ...  Together, these results delineate a promising path to decode natural language processing in real time from non-invasive recordings of brain activity.  ...  For example, the Mel spectrogram is often targeted for neural decoding because it representats sounds similarly to the cochlea [Mermelstein, 1976] .  ... 
arXiv:2208.12266v1 fatcat:3neona2zxrdubltcohsetathgq
« Previous Showing results 1 — 15 out of 1,962 results