Filters








230 Hits in 1.5 sec

Monotonic Multihead Attention [article]

Xutai Ma, Juan Pino, James Cross, Liezl Puzon, Jiatao Gu
2019 arXiv   pre-print
In this paper, we propose a new attention mechanism, Monotonic Multihead Attention (MMA), which extends the monotonic attention mechanism to multihead attention.  ...  Recent approaches for this task either apply a fixed policy on a state-of-the art Transformer model, or a learnable monotonic attention on a weaker recurrent neural network-based structure.  ...  We thus propose monotonic multihead attention (MMA), which combines the strengths of multilayer multihead attention and monotonic attention.  ... 
arXiv:1909.12406v1 fatcat:spxstjt3kbbkhfrr5d6hja742u

Enhancing Monotonic Multihead Attention for Streaming ASR [article]

Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara
2020 arXiv   pre-print
We investigate a monotonic multihead attention (MMA) by extending hard monotonic attention to Transformer-based automatic speech recognition (ASR) for online streaming applications.  ...  Chunkwise attention on each MA head is extended to the multihead counterpart. Finally, we propose head-synchronous beam search decoding to guarantee stable streaming inference.  ...  Monotonic multihead attention (MMA) In this section, we review hard monotonic attention (HMA) [16] , monotonic chunkwise attention (MoChA) [17] , and monotonic multihead attention (MMA) [27] as an  ... 
arXiv:2005.09394v3 fatcat:oxcw5m7m2fcxpjjrhybynq66wq

Enhancing Monotonic Multihead Attention for Streaming ASR

Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara
2020 Interspeech 2020  
We investigate a monotonic multihead attention (MMA) by extending hard monotonic attention to Transformer-based automatic speech recognition (ASR) for online streaming applications.  ...  Chunkwise attention on each MA head is extended to the multihead counterpart. Finally, we propose head-synchronous beam search decoding to guarantee stable streaming inference.  ...  Monotonic multihead attention (MMA) In this section, we review hard monotonic attention (HMA) [16] , monotonic chunkwise attention (MoChA) [17] , and monotonic multihead attention (MMA) [27] as an  ... 
doi:10.21437/interspeech.2020-1780 dblp:conf/interspeech/InagumaMK20a fatcat:z5apfhgj6rgvfnsejsetvpox2e

Mutually-Constrained Monotonic Multihead Attention for Online ASR [article]

Jaeyun Song, Hajin Shim, Eunho Yang
2021 arXiv   pre-print
Despite the feature of real-time decoding, Monotonic Multihead Attention (MMA) shows comparable performance to the state-of-the-art offline methods in machine translation and automatic speech recognition  ...  Specifically, we derive the expected alignments from monotonic attention by considering the boundaries of other heads and reflect them in the learning process.  ...  Monotonic Multihead Attention MMA [14] applies MA mechanism to Transformer [3] by making each of the multiple heads of decoder-encoder attention learn monotonic alignments as MA.  ... 
arXiv:2103.14302v1 fatcat:72cvyncs5fhabobvhh5ovqw5ry

Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention [article]

Xutai Ma, Hongyu Gong, Danni Liu, Ann Lee, Yun Tang, Peng-Jen Chen, Wei-Ning Hsu, Phillip Koehn, Juan Pino
2022 arXiv   pre-print
We also introduce the variational monotonic multihead attention (V-MMA), to handle the challenge of inefficient policy learning in speech simultaneous translation.  ... 
arXiv:2110.08250v2 fatcat:wcaduwxmc5h2bboh5yeaqqcena

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition [article]

Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie
2020 arXiv   pre-print
In this work, we propose a novel online E2E-ASR system by using Streaming Chunk-Aware Multihead Attention(SCAMA) and a latency control memory equipped self-attention network (LC-SAN-M).  ...  Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more and more attention.  ...  Figure 3 is the visualization of attention in the last layer of E2E1 and E2E7. The general trend of full sequence multihead attention is monotonous.  ... 
arXiv:2006.01712v1 fatcat:6fpcj2y2bfgf5c5xkdewioytc4

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie
2020 Interspeech 2020  
In this work, we propose a novel online E2E-ASR system by using Streaming Chunk-Aware Multihead Attention (SCAMA) and a latency control memory equipped self-attention network (LC-SAN-M).  ...  Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more and more attention.  ...  Figure 3 is the visualization of attention in the last layer of E2E1 and E2E7. The general trend of full sequence multihead attention is monotonous.  ... 
doi:10.21437/interspeech.2020-1972 dblp:conf/interspeech/ZhangGLLGYX20 fatcat:qes5faccuvcobkbrgkhl4xrcea

Towards Online End-to-end Transformer Automatic Speech Recognition [article]

Emiru Tsunoo, Yosuke Kashiwagi, Toshiyuki Kumakura, Shinji Watanabe
2019 arXiv   pre-print
In this paper, we extend it towards an entire online E2E ASR system by introducing an online decoding process inspired by monotonic chunkwise attention (MoChA) into the Transformer decoder.  ...  Our novel MoChA training and inference algorithms exploit the unique properties of Transformer, whose attentions are not always monotonic or peaky, and have multiple heads and residual connections of the  ...  We set d model = 256 and M = 4 for the multihead attentions.  ... 
arXiv:1910.11871v1 fatcat:462quwuou5duvjqwf2cdu5a3xa

SimulMT to SimulST: Adapting Simultaneous Text Translation to End-to-End Simultaneous Speech Translation [article]

Xutai Ma, Juan Pino, Philipp Koehn
2020 arXiv   pre-print
We investigate how to adapt simultaneous text translation methods such as wait-k and monotonic multihead attention to end-to-end simultaneous speech translation by introducing a pre-decision module.  ...  Monotonic Multihead Attention (MMA) extends monotonic attention (Raffel et al., 2017; Arivazhagan et al., 2019) to Transformer-based models.  ...  For the models with monotonic multihead attention, we first train a model without latency with λ latency = 0.  ... 
arXiv:2011.02048v1 fatcat:jfawf45j4jabdcxh65vod3s72e

Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems [article]

Mohd Abbas Zaidi, Beomseok Lee, Nikhil Kumar Lakumarapu, Sangha Kim, Chanwoo Kim
2021 arXiv   pre-print
Motivated by these improvements, we propose to add Decision Attentive Regularization (DAR) to Monotonic Multihead Attention (MMA) based SimulST systems.  ...  Monotonic Hard Attention [2] , Monotonic Chunkwise Attention (MoChA), [3] , Monotonic Infinite Lookback Attention (MILk) [4] and Monotonic Multihead Attention(MMA) [5] .  ...  Recent approaches use monotonic attention to learn a flexible policy. Different variants of monotonic attention have been proposed such as ⋆ Equal contribution Preprint: © 20XX IEEE.  ... 
arXiv:2110.15729v1 fatcat:twyyluiy7beyjcfhtftns5jnny

Memory Controlled Sequential Self Attention for Sound Recognition

Arjun Pankajakshan, Helen L. Bear, Vinod Subramanian, Emmanouil Benetos
2020 Interspeech 2020  
We extend the proposed idea with a multi-head self attention mechanism where each attention head processes the audio embedding with explicit attention width values.  ...  We show that our memory controlled self attention model achieves an event based F -score of 33.92% on the URBAN-SED dataset, outperforming the Fscore of 20.10% reported by the model without self attention  ...  Also, we cannot expect a monotonic model behavior based on the attention width value.  ... 
doi:10.21437/interspeech.2020-1953 dblp:conf/interspeech/PankajakshanBSB20 fatcat:drllzx7vbjbjrluamegazlcdpm

Enhancing Monotonicity for Robust Autoregressive Transformer TTS

Xiangyu Liang, Zhiyong Wu, Runnan Li, Yanqing Liu, Sheng Zhao, Helen Meng
2020 Interspeech 2020  
Therefore, in this paper, we propose a monotonicity enhancing approach with the combining use of Stepwise Monotonic Attention (SMA) and multi-head attention for Transformer based TTS system.  ...  However, this approach lacks monotonic constraint and is deficient with issues like pronunciation skipping.  ...  Propose a novel monotonicity enhanced attention approach with the combining use of multi-head attention and Stepwise Monotonic Attention for Transformer. 2.  ... 
doi:10.21437/interspeech.2020-1751 dblp:conf/interspeech/LiangWLLZM20 fatcat:z2jt443ahnghzbbwmysyyy3uvi

Head-synchronous Decoding for Transformer-based Streaming ASR [article]

Mohan Li, Catalin Zorila, Rama Doddipatla
2021 arXiv   pre-print
Since DACS employs a truncation threshold to determine the halting position, some of the attention weights are cut off untimely and might impact the stability and precision of decoding.  ...  However, like any other online approach, the DACS-based attention heads in each of the Transformer decoder layers operate independently (or asynchronously) and lead to diverged attending positions.  ...  These include Hard Monotonic Attention (HMA) [11] , Monotonic Chunkwise Attention (MoChA) [12, 13] and Monotonic Truncated Attention (MTA) [14] ; ii) Triggered attention methods are conditioned on  ... 
arXiv:2104.12631v1 fatcat:dni2uyyuebcqbbksep5zpxsq5q

Low Latency End-to-End Streaming Speech Recognition with a Scout Network [article]

Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Liang Lu, Guoli Ye, Ming Zhou
2020 arXiv   pre-print
The attention-based Transformer model has achieved promising results for speech recognition (SR) in the offline mode.  ...  To solve the first problem, Transformer based monotonic chunkwise attention (MoChA) [12] and trigger attention mechanism (TA) [13] have been proposed to replace the global encoder-decoder attention  ...  Multihead attention mechanism is proposed, where in each head, weights are formed from queries (Q ∈ R d ) and keys (K ∈ R d ) and then applied to values (V ∈ R d ) as Multihead(Q, K, V) = Concat(head1:  ... 
arXiv:2003.10369v4 fatcat:2iczpkgbnvgbhj2ulirctjkv54

Low Latency End-to-End Streaming Speech Recognition with a Scout Network

Chengyi Wang, Yu Wu, Liang Lu, Shujie Liu, Jinyu Li, Guoli Ye, Ming Zhou
2020 Interspeech 2020  
The attention-based Transformer model has achieved promising results for speech recognition (SR) in the offline mode.  ...  To solve the first problem, Transformer based monotonic chunkwise attention (MoChA) [12] and trigger attention mechanism (TA) [13] have been proposed to replace the global encoder-decoder attention  ...  Multihead attention mechanism is proposed, where in each head, weights are formed from queries (Q ∈ R d ) and keys (K ∈ R d ) and then applied to values (V ∈ R d ) as Multihead(Q, K, V) = Concat(head1  ... 
doi:10.21437/interspeech.2020-1292 dblp:conf/interspeech/00020L0LYZ20 fatcat:3nz3qpekjbh6rif4guael4hsfq
« Previous Showing results 1 — 15 out of 230 results