Filters








25 Hits in 0.9 sec

Efficient Purely Convolutional Text Encoding [article]

Szymon Malik, Adrian Lancucki, Jan Chorowski
2018 arXiv   pre-print
In this work, we focus on a lightweight convolutional architecture that creates fixed-size vector embeddings of sentences. Such representations are useful for building NLP systems, including conversational agents. Our work derives from a recently proposed recursive convolutional architecture for auto-encoding text paragraphs at byte level. We propose alternations that significantly reduce training time, the number of parameters, and improve auto-encoding accuracy. Finally, we evaluate the
more » ... entations created by our model on tasks from SentEval benchmark suite, and show that it can serve as a better, yet fairly low-resource alternative to popular bag-of-words embeddings.
arXiv:1808.01160v1 fatcat:3bduopux6ve2vmbqtqscjxzl2y

GGP with Advanced Reasoning and Board Knowledge Discovery [article]

Adrian Łańcucki
2014 arXiv   pre-print
Quality of General Game Playing (GGP) matches suffers from slow state-switching and weak knowledge modules. Instantiation and Propositional Networks offer great performance gains over Prolog-based reasoning, but do not scale well. In this publication mGDL, a variant of GDL stripped of function constants, has been defined as a basis for simple reasoning machines. mGDL allows to easily map rules to C++ functions. 253 out of 270 tested GDL rule sheets conformed to mGDL without any modifications;
more » ... e rest required minor changes. A revised (m)GDL to C++ translation scheme has been reevaluated; it brought gains ranging from 28% to 7300% over YAP Prolog, managing to compile even demanding rule sheets under few seconds. For strengthening game knowledge, spatial features inspired by similar successful techniques from computer Go have been proposed. For they required an Euclidean metric, a small board extension to GDL has been defined through a set of ground atomic sentences. An SGA-based genetic algorithm has been designed for tweaking game parameters and conducting self-plays, so the features could be mined from meaningful game records. The approach has been tested on a small cluster, giving performance gains up to 20% more wins against the baseline UCT player. Implementations of proposed ideas constitutes the core of GGP Spatium - a small C++/Python GGP framework, created for developing compact GGP Players and problem solvers.
arXiv:1401.5813v1 fatcat:wi3fiprlhnbwzg275p6z4ge67e

One TTS Alignment To Rule Them All [article]

Rohan Badlani, Adrian Łancucki, Kevin J. Shih, Rafael Valle, Wei Ping, Bryan Catanzaro
2021 arXiv   pre-print
Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive endto-end TTS models rely on durations extracted from external sources. In this paper we leverage the alignment mechanism proposed in
more » ... AD-TTS as a generic alignment learning framework, easily applicable to a variety of neural TTS models. The framework combines forward-sum algorithm, the Viterbi algorithm, and a simple and efficient static prior. In our experiments, the alignment learning framework improves all tested TTS architectures, both autoregressive (Flowtron, Tacotron 2) and non-autoregressive (FastPitch, FastSpeech 2, RAD-TTS). Specifically, it improves alignment convergence speed of existing attention-based mechanisms, simplifies the training pipeline, and makes the models more robust to errors on long utterances. Most importantly, the framework improves the perceived speech synthesis quality, as judged by human evaluators.
arXiv:2108.10447v1 fatcat:ua2hbehfareoxnfkfxbawo26ee

FastPitch: Parallel Text-to-speech with Pitch Prediction [article]

Adrian Łańcucki
2021 arXiv   pre-print
We present FastPitch, a fully-parallel text-to-speech model based on FastSpeech, conditioned on fundamental frequency contours. The model predicts pitch contours during inference. By altering these predictions, the generated speech can be more expressive, better match the semantic of the utterance, and in the end more engaging to the listener. Uniformly increasing or decreasing pitch with FastPitch generates speech that resembles the voluntary modulation of voice. Conditioning on frequency
more » ... urs improves the overall quality of synthesized speech, making it comparable to state-of-the-art. It does not introduce an overhead, and FastPitch retains the favorable, fully-parallel Transformer architecture, with over 900x real-time factor for mel-spectrogram synthesis of a typical utterance.
arXiv:2006.06873v2 fatcat:6nklt2iqobblxk2z4ah5xqfsrq

Named Entity Recognition and Linking Augmented with Large-Scale Structured Data [article]

Paweł Rychlikowski, Bartłomiej Najdecki, Adrian Łańcucki, Adam Kaczmarek
2021 arXiv   pre-print
In this paper we describe our submissions to the 2nd and 3rd SlavNER Shared Tasks held at BSNLP 2019 and BSNLP 2021, respectively. The tasks focused on the analysis of Named Entities in multilingual Web documents in Slavic languages with rich inflection. Our solution takes advantage of large collections of both unstructured and structured documents. The former serve as data for unsupervised training of language models and embeddings of lexical units. The latter refers to Wikipedia and its
more » ... ured counterpart - Wikidata, our source of lemmatization rules, and real-world entities. With the aid of those resources, our system could recognize, normalize and link entities, while being trained with only small amounts of labeled data.
arXiv:2104.13456v1 fatcat:z5y3anpmzzbtpaissmccpra3re

Robust Training of Vector Quantized Bottleneck Models [article]

Adrian Łańcucki, Jan Chorowski, Guillaume Sanchez, Ricard Marxer, Nanxin Chen, Hans J.G.A. Dolfing, Sameer Khurana, Tanel Alumäe, Antoine Laurent
2020 arXiv   pre-print
In this paper we demonstrate methods for reliable and efficient training of discrete representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs). Discrete latent variable models have been shown to learn nontrivial representations of speech, applicable to unsupervised voice conversion and reaching state-of-the-art performance on unit discovery tasks. For unsupervised representation learning, they became viable alternatives to continuous latent variable models such as the
more » ... tional Auto-Encoder (VAE). However, training deep discrete variable models is challenging, due to the inherent non-differentiability of the discretization operation. In this paper we focus on VQ-VAE, a state-of-the-art discrete bottleneck model shown to perform on par with its continuous counterparts. It quantizes encoder outputs with on-line k-means clustering. We show that the codebook learning can suffer from poor initialization and non-stationarity of clustered encoder outputs. We demonstrate that these can be successfully overcome by increasing the learning rate for the codebook and periodic date-dependent codeword re-initialization. As a result, we achieve more robust training across different tasks, and significantly increase the usage of latent codewords even for large codebooks. This has practical benefit, for instance, in unsupervised representation learning, where large codebooks may lead to disentanglement of latent representations.
arXiv:2005.08520v1 fatcat:c6cjp2vkn5bprefp5siy63y2s4

Lattice Generation in Attention-Based Speech Recognition Models

Michał Zapotoczny, Piotr Pietrzak, Adrian Łańcucki, Jan Chorowski
2019 Interspeech 2019  
Attention-based neural speech recognition models are frequently decoded with beam search, which produces a tree of hypotheses. In many cases, such as when using external language models, numerous decoding hypotheses need to be considered, requiring large beam sizes during decoding. We demonstrate that it is possible to merge certain nodes in a tree of hypotheses, in order to obtain a decoding lattice, which increases the number of decoding hypotheses without increasing the number of candidates
more » ... hat are scored by the neural network. We propose a convolutional architecture, which facilitates comparing states of the model at different pi The experiments are carried on the Wall Street Journal dataset, where the lattice decoder obtains lower word error rates with smaller beam sizes, than an otherwise similar architecture with regular beam search.
doi:10.21437/interspeech.2019-2667 dblp:conf/interspeech/ZapotocznyPLC19 fatcat:7crzhi2vivde3bdza2gstteoke

Aligned Contrastive Predictive Coding [article]

Jan Chorowski, Grzegorz Ciesielski, Jarosław Dzikowski, Adrian Łańcucki, Ricard Marxer, Mateusz Opala, Piotr Pusz, Paweł Rychlikowski, Michał Stypułkowski
2021 arXiv   pre-print
We investigate the possibility of forcing a self-supervised model trained using a contrastive predictive loss to extract slowly varying latent representations. Rather than producing individual predictions for each of the future representations, the model emits a sequence of predictions shorter than that of the upcoming representations to which they will be aligned. In this way, the prediction network solves a simpler task of predicting the next symbols, but not their exact timing, while the
more » ... ding network is trained to produce piece-wise constant latent codes. We evaluate the model on a speech coding task and demonstrate that the proposed Aligned Contrastive Predictive Coding (ACPC) leads to higher linear phone prediction accuracy and lower ABX error rates, while being slightly faster to train due to the reduced number of prediction heads.
arXiv:2104.11946v3 fatcat:l3g6cltlgjgpdni2ubj54gs2ru

Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw [article]

Jan Chorowski, Grzegorz Ciesielski, Jarosław Dzikowski, Adrian Łańcucki, Ricard Marxer, Mateusz Opala, Piotr Pusz, Paweł Rychlikowski, Michał Stypułkowski
2021 arXiv   pre-print
We present a number of low-resource approaches to the tasks of the Zero Resource Speech Challenge 2021. We build on the unsupervised representations of speech proposed by the organizers as a baseline, derived from CPC and clustered with the k-means algorithm. We demonstrate that simple methods of refining those representations can narrow the gap, or even improve upon the solutions which use a high computational budget. The results lead to the conclusion that the CPC-derived representations are
more » ... till too noisy for training language models, but stable enough for simpler forms of pattern matching and retrieval.
arXiv:2106.11603v1 fatcat:vxbb4t65nnedleoenaxkxkpbje

A Talker Ensemble: the University of Wrocław's Entry to the NIPS 2017 Conversational Intelligence Challenge [article]

Jan Chorowski, Adrian Łańcucki, Szymon Malik, Maciej Pawlikowski, Paweł Rychlikowski, Paweł Zykowski
2018 arXiv   pre-print
We present Poetwannabe, a chatbot submitted by the University of Wrocław to the NIPS 2017 Conversational Intelligence Challenge, in which it ranked first ex-aequo. It is able to conduct a conversation with a user in a natural language. The primary functionality of our dialogue system is context-aware question answering (QA), while its secondary function is maintaining user engagement. The chatbot is composed of a number of sub-modules, which independently prepare replies to user's prompts and
more » ... sess their own confidence. To answer questions, our dialogue system relies heavily on factual data, sourced mostly from Wikipedia and DBpedia, data of real user interactions in public forums, as well as data concerning general literature. Where applicable, modules are trained on large datasets using GPUs. However, to comply with the competition's requirements, the final system is compact and runs on commodity hardware.
arXiv:1805.08032v1 fatcat:q7fetfwih5cbbcttle44vokqby

A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning [article]

Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James Glass
2020 arXiv   pre-print
Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech. LVMs admit an intuitive probabilistic interpretation where the latent structure shapes the information extracted from the signal. Even though LVMs have recently seen a renewed interest due to the introduction of Variational Autoencoders (VAEs), their use for speech representation learning remains largely unexplored. In this work, we
more » ... Convolutional Deep Markov Model (ConvDMM), a Gaussian state-space model with non-linear emission and transition functions modelled by deep neural networks. This unsupervised model is trained using black box variational inference. A deep convolutional neural network is used as an inference network for structured variational approximation. When trained on a large scale speech dataset (LibriSpeech), ConvDMM produces features that significantly outperform multiple self-supervised feature extracting methods on linear phone classification and recognition on the Wall Street Journal dataset. Furthermore, we found that ConvDMM complements self-supervised methods like Wav2Vec and PASE, improving on the results achieved with any of the methods alone. Lastly, we find that ConvDMM features enable learning better phone recognizers than any other features in an extreme low-resource regime with few labeled training examples.
arXiv:2006.02547v2 fatcat:6x67rbwgqraprmovzje4nkny2i

Towards Using Context-Dependent Symbols in CTC Without State-Tying Decision Trees

Jan Chorowski, Adrian Łańcucki, Bartosz Kostka, Michał Zapotoczny
2019 Interspeech 2019  
Deep neural acoustic models benefit from context-dependent (CD) modeling of output symbols. We consider direct training of CTC networks with CD outputs, and identify two issues. The first one is frame-level normalization of probabilities in CTC, which induces strong language modeling behavior that leads to overfitting and interference with external language models. The second one is poor generalization in the presence of numerous lexical units like triphones or tri-chars. We mitigate the former
more » ... with utterance-level normalization of probabilities. The latter typically requires reducing the CD symbol inventory with state-tying decision trees, which have to be transferred from classical GMM-HMM systems. We replace the trees with a CD symbol embedding network, which saves parameters and ensures generalization to unseen and undersampled CD symbols. The embedding network is trained together with the rest of the acoustic model and removes one of the last cases in which neural systems have to be bootstrapped from GMM-HMM ones.
doi:10.21437/interspeech.2019-2720 dblp:conf/interspeech/ChorowskiLKZ19 fatcat:aeo74hwxczcj7cndgkcfifnqu4

Towards Using Context-Dependent Symbols in CTC Without State-Tying Decision Trees [article]

Jan Chorowski, Adrian Lancucki, Bartosz Kostka, Michal Zapotoczny
2019 arXiv   pre-print
Deep neural acoustic models benefit from context-dependent (CD) modeling of output symbols. We consider direct training of CTC networks with CD outputs, and identify two issues. The first one is frame-level normalization of probabilities in CTC, which induces strong language modeling behavior that leads to overfitting and interference with external language models. The second one is poor generalization in the presence of numerous lexical units like triphones or tri-chars. We mitigate the former
more » ... with utterance-level normalization of probabilities. The latter typically requires reducing the CD symbol inventory with state-tying decision trees, which have to be transferred from classical GMM-HMM systems. We replace the trees with a CD symbol embedding network, which saves parameters and ensures generalization to unseen and undersampled CD symbols. The embedding network is trained together with the rest of the acoustic model and removes one of the last cases in which neural systems have to be bootstrapped from GMM-HMM ones.
arXiv:1901.04379v2 fatcat:gh7ry67moba5pdii3tewy4gwre

A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning

Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James Glass
2020 Interspeech 2020  
Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech. LVMs admit an intuitive probabilistic interpretation where the latent structure shapes the information extracted from the signal. Even though LVMs have recently seen a renewed interest due to the introduction of Variational Autoencoders (VAEs), their use for speech representation learning remains largely unexplored. In this work, we
more » ... Convolutional Deep Markov Model (ConvDMM), a Gaussian state-space model with non-linear emission and transition functions modelled by deep neural networks. This unsupervised model is trained using black box variational inference. A deep convolutional neural network is used as an inference network for structured variational approximation. When trained on a large scale speech dataset (LibriSpeech), ConvDMM produces features that significantly outperform multiple self-supervised feature extracting methods on linear phone classification and recognition on the Wall Street Journal dataset. Furthermore, we found that ConvDMM complements self-supervised methods like Wav2Vec and PASE, improving on the results achieved with any of the methods alone. Lastly, we find that ConvDMM features enable learning better phone recognizers than any other features in an extreme low-resource regime with few labelled training examples.
doi:10.21437/interspeech.2020-3084 dblp:conf/interspeech/KhuranaLHCLMG20 fatcat:2resntl7wzhoxi2elxcfgqkjsq

Continuous Population-Based Incremental Learning with Mixture Probability Modeling for Dynamic Optimization Problems [chapter]

Adrian Lancucki, Jan Chorowski, Krzysztof Michalak, Patryk Filipiak, Piotr Lipinski
2014 Lecture Notes in Computer Science  
This paper proposes a multimodal extension of PBILC based on Gaussian mixture models for solving dynamic optimization problems. By tracking multiple optima, the algorithm is able to follow the changes in objective functions more efficiently than in the unimodal case. The approach was validated on a set of synthetic benchmarks including Moving Peaks, dynamization of the Rosenbrock function and compositions of functions from the IEEE CEC'2009 competition. The result obtained in the experiments
more » ... ved the efficiency of the approach in solving dynamic problems with a number of competing peaks.
doi:10.1007/978-3-319-10840-7_55 fatcat:c2ypmjgiarft7fjhlgvtkhekbm
« Previous Showing results 1 — 15 out of 25 results