Filters








18,856 Hits in 5.6 sec

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals [article]

Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie
2020 arXiv   pre-print
We extend the standard sequence-to-sequence model to a conditional multi-sequence model, which explicitly models the relevance between multiple output sequences with the probabilistic chain rule.  ...  Neural sequence-to-sequence models are well established for applications which can be cast as mapping a single input sequence into a single output sequence.  ...  other sequence-to-multi-sequence problems for mixture signals, as a general machine learning framework.  ... 
arXiv:2006.14150v1 fatcat:dhkrd3twqndjrmajndgwn6ldji

End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection [article]

Yuki Takashima, Yusuke Fujita, Shinji Watanabe, Shota Horiguchi, Paola García, Kenji Nagamatsu
2021 arXiv   pre-print
In this paper, we present a conditional multitask learning method for end-to-end neural speaker diarization (EEND).  ...  We optimize speaker diarization conditioned on speech activity and overlap detection that are subtasks of speaker diarization, based on the probabilistic chain rule.  ...  To take the order between tasks into account, we employ a conditional parallel mapping [21] that models the relevance between multiple output sequences explicitly via the probabilistic chain rule.  ... 
arXiv:2106.04078v1 fatcat:vhijmcvvozavlktuzwd5ughjhe

Biologically-Inspired Spike-Based Automatic Speech Recognition of Isolated Digits Over a Reproducing Kernel Hilbert Space

Kan Li, José C. Príncipe
2018 Frontiers in Neuroscience  
sequence.  ...  Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF) spike generator.  ...  ACKNOWLEDGMENTS We would like to thank Dr. John G. Harris for his helpful discussions during the research.  ... 
doi:10.3389/fnins.2018.00194 pmid:29666568 pmcid:PMC5891646 fatcat:weji6gclmzbrjl5frmbipfmjwy

Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities

Angela Yao, Juergen Gall, Luc Van Gool, Raquel Urtasun
2011 Neural Information Processing Systems  
A common approach for handling the complexity and inherent ambiguities of 3D human pose estimation is to use pose priors learned from training data.  ...  Existing approaches however, are either too simplistic (linear), too complex to learn, or can only learn latent spaces from "simple data", i.e., single activities such as walking or running.  ...  The corresponding pose is computed by projecting back to the data space via the Gaussian process mapping learned in the GPLVM.  ... 
dblp:conf/nips/YaoGGU11 fatcat:5knxxlghv5eipfybrdeh7hyiyq

CDN-MEDAL: Two-stage Density and Difference Approximation Framework for Motion Analysis [article]

Synh Viet-Uyen Ha, Cuong Tien Nguyen, Hung Ngoc Phan, Nhat Minh Chung, Phuong Hoai Ha
2021 arXiv   pre-print
However, the techniques have only provided limited descriptions of scenes' properties while requiring heavy computations, as their single-valued mapping functions are learned to approximate the temporal  ...  On the other hand, statistical learning in imagery domains has been a prevalent approach with high adaptation to dynamic context transformation, notably using Gaussian Mixture Models (GMM) with its generalization  ...  For modeling tasks, we seek to establish a universal multi-modular statistical mapping function on the RGB color space, which would require optimizing the loss not just on any single pixel, but for b block  ... 
arXiv:2106.03776v4 fatcat:hoiazfi4zfgf3hxv7ybiwinp5e

A modular kernel approach for integrative analysis of protein domain boundaries

Paul D Yoo, Bing Zhou, Albert Y Zomaya
2009 BMC Genomics  
One of the key features of this profiling technique is the use of multiple structural alignments of remote homologues to create an extended sequence profile and combines the structural information with  ...  This profile can capture the sequence characteristics of an entire structural superfamily and extend a range of profiles generated from sequence similarity alone.  ...  fall in the interval [-1, 1] to be fed into networks; (4) target levels were assigned to each profile (positive, +1, for domain boundary residues and negative, -1, for non-boundary residues); (5) a hold-out  ... 
doi:10.1186/1471-2164-10-s3-s21 pmid:19958485 pmcid:PMC2788374 fatcat:dhkowfzqnrdajivnum255ugsue

Machine Speech Chain with One-shot Speaker Adaptation

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
2018 Interspeech 2018  
In previous work, we developed a closed-loop speech chain model based on deep learning, in which the architecture enabled the automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components  ...  In the speech chain loop mechanism, ASR also benefits from the ability to further learn an arbitrary speakers characteristics from the generated speech waveform, resulting in a significant improvement  ...  Sequence-to-Sequence ASR A sequence-to-sequence [6] architecture is a type of neural network that directly models the conditional probability P (y|x) between two sequences x and y.  ... 
doi:10.21437/interspeech.2018-1558 dblp:conf/interspeech/TjandraS018 fatcat:qqp7dvxgsvgj3blwj73eguna24

Towards Neural Mixture Recommender for Long Range Dependent User Sequences

Jiaxi Tang, Francois Belletti, Sagar Jain, Minmin Chen, Alex Beutel, Can Xu, Ed H. Chi
2019 The World Wide Web Conference on - WWW '19  
We then propose a neural Multi-temporal-range Mixture Model (M3) as a tailored solution to deal with both short-term and long-term dependencies.  ...  Understanding temporal dynamics has proved to be highly valuable for accurate recommendation. Sequential recommenders have been successful in modeling the dynamics of users and items over time.  ...  of a Multi-temporal-range Mixture Model, or M3 for short.  ... 
doi:10.1145/3308558.3313650 dblp:conf/www/TangBJCBXC19 fatcat:t4df6axuznfvrlnoqrbhc4g6ky

Machine Speech Chain with One-shot Speaker Adaptation [article]

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
2018 arXiv   pre-print
In previous work, we developed a closed-loop speech chain model based on deep learning, in which the architecture enabled the automatic speech recognition (ASR) and text-to-speech synthesis (TTS) components  ...  In the speech chain loop mechanism, ASR also benefits from the ability to further learn an arbitrary speaker's characteristics from the generated speech waveform, resulting in a significant improvement  ...  Sequence-to-Sequence ASR A sequence-to-sequence [6] architecture is a type of neural network that directly models the conditional probability P (y|x) between two sequences x and y.  ... 
arXiv:1803.10525v1 fatcat:6i3ajnqyjvcwxhqhvyxqrfbd6i

Latent Variable Algorithms for Multimodal Learning and Sensor Fusion [article]

Lijiang Guo
2019 arXiv   pre-print
We design a co-learning mechanism to encourage co-adaption and independent learning of each sensor at the same time, and propose a regularization based co-learning method.  ...  In the third part, we extend the siamese structure to sensor fusion for robust acoustic event detection.  ...  We would like to thank Dr. Geoffrey Fox, Dr. Minje Kim, Dr. Francesco Nesta, Dr. Michael Ryoo and Dr. Lantao Liu for helpful discussions.  ... 
arXiv:1904.10450v1 fatcat:6634ghs74fcd3fz3l4nov4rb3m

Accounting for Room Acoustics in Audio-Visual Multi-Speaker Tracking

Yutong Ban, Xiaofei Li, Xavier Alameda-Pineda, Laurent Girin, Radu Horaud
2018 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
However, practical methods must be robust to changes in acoustic conditions, e.g. reverberation.  ...  We investigate how to combine state-of-the-art audio-source localization techniques with Bayesian multi-person tracking.  ...  This allows the audio-visual mapping to be learned over the entire field of view.  ... 
doi:10.1109/icassp.2018.8462100 dblp:conf/icassp/BanLAGH18 fatcat:n6jqwr2oqrao5fyyg6q2uzic7a

Single Channel Speech Separation Using Factorial Dynamics

John R. Hershey, Trausti T. Kristjansson, Steven J. Rennie, Peder A. Olsen
2006 Neural Information Processing Systems  
Remarkably, the system surpasses human recognition performance in many conditions. The models of speech use temporal dynamics to help infer the source speech signals, given mixed speech signals.  ...  In this model, dynamics are modeled using a layered combination of one or two Markov chains: one for long-term dependencies and another for short-term dependencies.  ...  These are learned from training data where the grammar state sequences and acoustic state sequences are known for each utterance.  ... 
dblp:conf/nips/HersheyKRO06 fatcat:v4lr4wooxvh3xjsgpc544xbr5q

Multirate Coupled Hidden Markov Models and Their Application to Machining Tool-Wear Classification

zgr etin, Mari Ostendorf, Gary D. Bernard
2007 IEEE Transactions on Signal Processing  
Scales in the multi-rate HMMs are organized in a coarse-to-fine manner with Markov conditional independence assumptions within and across scales, allowing for a parsimonious representation of both shortand  ...  This paper introduces multi-rate coupled hidden Markov models (multi-rate HMMs for short) for multiscale modeling of nonstationary processes, extending traditional HMMs from single to multiple time scales  ...  Unlike either model, each scale-based state variable sequence in multi-rate coupled HMMs constitutes a Markov chain when conditioned on coarser ancestor variables.  ... 
doi:10.1109/tsp.2007.893972 fatcat:mfb57xu7bjdi5ftpafinperbz4

Online Anomaly Detection Under Markov Statistics With Controllable Type-I Error

Huseyin Ozkan, Fatih Ozkan, Suleyman S. Kozat
2016 IEEE Transactions on Signal Processing  
The presented study is the first to provide the online implementation of Neyman-Pearson (NP) characterization for the problem such that the NP optimality, i.e., maximum detection power at a specified false  ...  In this regard, the proposed algorithm is highly novel and appropriate especially for the applications requiring sequential data processing at large scales/high rates due to its parameter-tuning free computational  ...  For example, two different sequences and mapping to the same state sequence , where is the state of the 'th observation in , are considered "same" up to small variations.  ... 
doi:10.1109/tsp.2015.2504345 fatcat:sqpexvbqvngv3ggcqblfq6pgra

Tracking the Active Speaker Based on a Joint Audio-Visual Observation Model

Israel D. Gebru, Sileye Ba, Georgios Evangelidis, Radu Horaud
2015 2015 IEEE International Conference on Computer Vision Workshop (ICCVW)  
Both visual and auditory observations are explained by a recently proposed weighted-data mixture model, while several options for the speaking turns dynamics are fulfilled by a multi-case transition model  ...  Any multi-party conversation system benefits from speaker diarization, that is, the assignment of speech signals among the participants.  ...  For example, [5] proposed a multi-speaker tracker using approximate inference implemented with a Markov chain Monte Carlo particle filter (MCMC-PF).  ... 
doi:10.1109/iccvw.2015.96 dblp:conf/iccvw/GebruBEH15 fatcat:lruasrz6sfgn7imwdwdd7gne2y
« Previous Showing results 1 — 15 out of 18,856 results