A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Efficient Purely Convolutional Text Encoding
[article]
2018
arXiv
pre-print
In this work, we focus on a lightweight convolutional architecture that creates fixed-size vector embeddings of sentences. Such representations are useful for building NLP systems, including conversational agents. Our work derives from a recently proposed recursive convolutional architecture for auto-encoding text paragraphs at byte level. We propose alternations that significantly reduce training time, the number of parameters, and improve auto-encoding accuracy. Finally, we evaluate the
arXiv:1808.01160v1
fatcat:3bduopux6ve2vmbqtqscjxzl2y
more »
... entations created by our model on tasks from SentEval benchmark suite, and show that it can serve as a better, yet fairly low-resource alternative to popular bag-of-words embeddings.
GGP with Advanced Reasoning and Board Knowledge Discovery
[article]
2014
arXiv
pre-print
Quality of General Game Playing (GGP) matches suffers from slow state-switching and weak knowledge modules. Instantiation and Propositional Networks offer great performance gains over Prolog-based reasoning, but do not scale well. In this publication mGDL, a variant of GDL stripped of function constants, has been defined as a basis for simple reasoning machines. mGDL allows to easily map rules to C++ functions. 253 out of 270 tested GDL rule sheets conformed to mGDL without any modifications;
arXiv:1401.5813v1
fatcat:wi3fiprlhnbwzg275p6z4ge67e
more »
... e rest required minor changes. A revised (m)GDL to C++ translation scheme has been reevaluated; it brought gains ranging from 28% to 7300% over YAP Prolog, managing to compile even demanding rule sheets under few seconds. For strengthening game knowledge, spatial features inspired by similar successful techniques from computer Go have been proposed. For they required an Euclidean metric, a small board extension to GDL has been defined through a set of ground atomic sentences. An SGA-based genetic algorithm has been designed for tweaking game parameters and conducting self-plays, so the features could be mined from meaningful game records. The approach has been tested on a small cluster, giving performance gains up to 20% more wins against the baseline UCT player. Implementations of proposed ideas constitutes the core of GGP Spatium - a small C++/Python GGP framework, created for developing compact GGP Players and problem solvers.
One TTS Alignment To Rule Them All
[article]
2021
arXiv
pre-print
Speech-to-text alignment is a critical component of neural textto-speech (TTS) models. Autoregressive TTS models typically use an attention mechanism to learn these alignments on-line. However, these alignments tend to be brittle and often fail to generalize to long utterances and out-of-domain text, leading to missing or repeating words. Most non-autoregressive endto-end TTS models rely on durations extracted from external sources. In this paper we leverage the alignment mechanism proposed in
arXiv:2108.10447v1
fatcat:ua2hbehfareoxnfkfxbawo26ee
more »
... AD-TTS as a generic alignment learning framework, easily applicable to a variety of neural TTS models. The framework combines forward-sum algorithm, the Viterbi algorithm, and a simple and efficient static prior. In our experiments, the alignment learning framework improves all tested TTS architectures, both autoregressive (Flowtron, Tacotron 2) and non-autoregressive (FastPitch, FastSpeech 2, RAD-TTS). Specifically, it improves alignment convergence speed of existing attention-based mechanisms, simplifies the training pipeline, and makes the models more robust to errors on long utterances. Most importantly, the framework improves the perceived speech synthesis quality, as judged by human evaluators.
FastPitch: Parallel Text-to-speech with Pitch Prediction
[article]
2021
arXiv
pre-print
We present FastPitch, a fully-parallel text-to-speech model based on FastSpeech, conditioned on fundamental frequency contours. The model predicts pitch contours during inference. By altering these predictions, the generated speech can be more expressive, better match the semantic of the utterance, and in the end more engaging to the listener. Uniformly increasing or decreasing pitch with FastPitch generates speech that resembles the voluntary modulation of voice. Conditioning on frequency
arXiv:2006.06873v2
fatcat:6nklt2iqobblxk2z4ah5xqfsrq
more »
... urs improves the overall quality of synthesized speech, making it comparable to state-of-the-art. It does not introduce an overhead, and FastPitch retains the favorable, fully-parallel Transformer architecture, with over 900x real-time factor for mel-spectrogram synthesis of a typical utterance.
Named Entity Recognition and Linking Augmented with Large-Scale Structured Data
[article]
2021
arXiv
pre-print
In this paper we describe our submissions to the 2nd and 3rd SlavNER Shared Tasks held at BSNLP 2019 and BSNLP 2021, respectively. The tasks focused on the analysis of Named Entities in multilingual Web documents in Slavic languages with rich inflection. Our solution takes advantage of large collections of both unstructured and structured documents. The former serve as data for unsupervised training of language models and embeddings of lexical units. The latter refers to Wikipedia and its
arXiv:2104.13456v1
fatcat:z5y3anpmzzbtpaissmccpra3re
more »
... ured counterpart - Wikidata, our source of lemmatization rules, and real-world entities. With the aid of those resources, our system could recognize, normalize and link entities, while being trained with only small amounts of labeled data.
Robust Training of Vector Quantized Bottleneck Models
[article]
2020
arXiv
pre-print
In this paper we demonstrate methods for reliable and efficient training of discrete representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs). Discrete latent variable models have been shown to learn nontrivial representations of speech, applicable to unsupervised voice conversion and reaching state-of-the-art performance on unit discovery tasks. For unsupervised representation learning, they became viable alternatives to continuous latent variable models such as the
arXiv:2005.08520v1
fatcat:c6cjp2vkn5bprefp5siy63y2s4
more »
... tional Auto-Encoder (VAE). However, training deep discrete variable models is challenging, due to the inherent non-differentiability of the discretization operation. In this paper we focus on VQ-VAE, a state-of-the-art discrete bottleneck model shown to perform on par with its continuous counterparts. It quantizes encoder outputs with on-line k-means clustering. We show that the codebook learning can suffer from poor initialization and non-stationarity of clustered encoder outputs. We demonstrate that these can be successfully overcome by increasing the learning rate for the codebook and periodic date-dependent codeword re-initialization. As a result, we achieve more robust training across different tasks, and significantly increase the usage of latent codewords even for large codebooks. This has practical benefit, for instance, in unsupervised representation learning, where large codebooks may lead to disentanglement of latent representations.
Lattice Generation in Attention-Based Speech Recognition Models
2019
Interspeech 2019
Attention-based neural speech recognition models are frequently decoded with beam search, which produces a tree of hypotheses. In many cases, such as when using external language models, numerous decoding hypotheses need to be considered, requiring large beam sizes during decoding. We demonstrate that it is possible to merge certain nodes in a tree of hypotheses, in order to obtain a decoding lattice, which increases the number of decoding hypotheses without increasing the number of candidates
doi:10.21437/interspeech.2019-2667
dblp:conf/interspeech/ZapotocznyPLC19
fatcat:7crzhi2vivde3bdza2gstteoke
more »
... hat are scored by the neural network. We propose a convolutional architecture, which facilitates comparing states of the model at different pi The experiments are carried on the Wall Street Journal dataset, where the lattice decoder obtains lower word error rates with smaller beam sizes, than an otherwise similar architecture with regular beam search.
Aligned Contrastive Predictive Coding
[article]
2021
arXiv
pre-print
We investigate the possibility of forcing a self-supervised model trained using a contrastive predictive loss to extract slowly varying latent representations. Rather than producing individual predictions for each of the future representations, the model emits a sequence of predictions shorter than that of the upcoming representations to which they will be aligned. In this way, the prediction network solves a simpler task of predicting the next symbols, but not their exact timing, while the
arXiv:2104.11946v3
fatcat:l3g6cltlgjgpdni2ubj54gs2ru
more »
... ding network is trained to produce piece-wise constant latent codes. We evaluate the model on a speech coding task and demonstrate that the proposed Aligned Contrastive Predictive Coding (ACPC) leads to higher linear phone prediction accuracy and lower ABX error rates, while being slightly faster to train due to the reduced number of prediction heads.
Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw
[article]
2021
arXiv
pre-print
We present a number of low-resource approaches to the tasks of the Zero Resource Speech Challenge 2021. We build on the unsupervised representations of speech proposed by the organizers as a baseline, derived from CPC and clustered with the k-means algorithm. We demonstrate that simple methods of refining those representations can narrow the gap, or even improve upon the solutions which use a high computational budget. The results lead to the conclusion that the CPC-derived representations are
arXiv:2106.11603v1
fatcat:vxbb4t65nnedleoenaxkxkpbje
more »
... till too noisy for training language models, but stable enough for simpler forms of pattern matching and retrieval.
A Talker Ensemble: the University of Wrocław's Entry to the NIPS 2017 Conversational Intelligence Challenge
[article]
2018
arXiv
pre-print
We present Poetwannabe, a chatbot submitted by the University of Wrocław to the NIPS 2017 Conversational Intelligence Challenge, in which it ranked first ex-aequo. It is able to conduct a conversation with a user in a natural language. The primary functionality of our dialogue system is context-aware question answering (QA), while its secondary function is maintaining user engagement. The chatbot is composed of a number of sub-modules, which independently prepare replies to user's prompts and
arXiv:1805.08032v1
fatcat:q7fetfwih5cbbcttle44vokqby
more »
... sess their own confidence. To answer questions, our dialogue system relies heavily on factual data, sourced mostly from Wikipedia and DBpedia, data of real user interactions in public forums, as well as data concerning general literature. Where applicable, modules are trained on large datasets using GPUs. However, to comply with the competition's requirements, the final system is compact and runs on commodity hardware.
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning
[article]
2020
arXiv
pre-print
Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech. LVMs admit an intuitive probabilistic interpretation where the latent structure shapes the information extracted from the signal. Even though LVMs have recently seen a renewed interest due to the introduction of Variational Autoencoders (VAEs), their use for speech representation learning remains largely unexplored. In this work, we
arXiv:2006.02547v2
fatcat:6x67rbwgqraprmovzje4nkny2i
more »
... Convolutional Deep Markov Model (ConvDMM), a Gaussian state-space model with non-linear emission and transition functions modelled by deep neural networks. This unsupervised model is trained using black box variational inference. A deep convolutional neural network is used as an inference network for structured variational approximation. When trained on a large scale speech dataset (LibriSpeech), ConvDMM produces features that significantly outperform multiple self-supervised feature extracting methods on linear phone classification and recognition on the Wall Street Journal dataset. Furthermore, we found that ConvDMM complements self-supervised methods like Wav2Vec and PASE, improving on the results achieved with any of the methods alone. Lastly, we find that ConvDMM features enable learning better phone recognizers than any other features in an extreme low-resource regime with few labeled training examples.
Towards Using Context-Dependent Symbols in CTC Without State-Tying Decision Trees
2019
Interspeech 2019
Deep neural acoustic models benefit from context-dependent (CD) modeling of output symbols. We consider direct training of CTC networks with CD outputs, and identify two issues. The first one is frame-level normalization of probabilities in CTC, which induces strong language modeling behavior that leads to overfitting and interference with external language models. The second one is poor generalization in the presence of numerous lexical units like triphones or tri-chars. We mitigate the former
doi:10.21437/interspeech.2019-2720
dblp:conf/interspeech/ChorowskiLKZ19
fatcat:aeo74hwxczcj7cndgkcfifnqu4
more »
... with utterance-level normalization of probabilities. The latter typically requires reducing the CD symbol inventory with state-tying decision trees, which have to be transferred from classical GMM-HMM systems. We replace the trees with a CD symbol embedding network, which saves parameters and ensures generalization to unseen and undersampled CD symbols. The embedding network is trained together with the rest of the acoustic model and removes one of the last cases in which neural systems have to be bootstrapped from GMM-HMM ones.
Towards Using Context-Dependent Symbols in CTC Without State-Tying Decision Trees
[article]
2019
arXiv
pre-print
Deep neural acoustic models benefit from context-dependent (CD) modeling of output symbols. We consider direct training of CTC networks with CD outputs, and identify two issues. The first one is frame-level normalization of probabilities in CTC, which induces strong language modeling behavior that leads to overfitting and interference with external language models. The second one is poor generalization in the presence of numerous lexical units like triphones or tri-chars. We mitigate the former
arXiv:1901.04379v2
fatcat:gh7ry67moba5pdii3tewy4gwre
more »
... with utterance-level normalization of probabilities. The latter typically requires reducing the CD symbol inventory with state-tying decision trees, which have to be transferred from classical GMM-HMM systems. We replace the trees with a CD symbol embedding network, which saves parameters and ensures generalization to unseen and undersampled CD symbols. The embedding network is trained together with the rest of the acoustic model and removes one of the last cases in which neural systems have to be bootstrapped from GMM-HMM ones.
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning
2020
Interspeech 2020
Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech. LVMs admit an intuitive probabilistic interpretation where the latent structure shapes the information extracted from the signal. Even though LVMs have recently seen a renewed interest due to the introduction of Variational Autoencoders (VAEs), their use for speech representation learning remains largely unexplored. In this work, we
doi:10.21437/interspeech.2020-3084
dblp:conf/interspeech/KhuranaLHCLMG20
fatcat:2resntl7wzhoxi2elxcfgqkjsq
more »
... Convolutional Deep Markov Model (ConvDMM), a Gaussian state-space model with non-linear emission and transition functions modelled by deep neural networks. This unsupervised model is trained using black box variational inference. A deep convolutional neural network is used as an inference network for structured variational approximation. When trained on a large scale speech dataset (LibriSpeech), ConvDMM produces features that significantly outperform multiple self-supervised feature extracting methods on linear phone classification and recognition on the Wall Street Journal dataset. Furthermore, we found that ConvDMM complements self-supervised methods like Wav2Vec and PASE, improving on the results achieved with any of the methods alone. Lastly, we find that ConvDMM features enable learning better phone recognizers than any other features in an extreme low-resource regime with few labelled training examples.
Continuous Population-Based Incremental Learning with Mixture Probability Modeling for Dynamic Optimization Problems
[chapter]
2014
Lecture Notes in Computer Science
This paper proposes a multimodal extension of PBILC based on Gaussian mixture models for solving dynamic optimization problems. By tracking multiple optima, the algorithm is able to follow the changes in objective functions more efficiently than in the unimodal case. The approach was validated on a set of synthetic benchmarks including Moving Peaks, dynamization of the Rosenbrock function and compositions of functions from the IEEE CEC'2009 competition. The result obtained in the experiments
doi:10.1007/978-3-319-10840-7_55
fatcat:c2ypmjgiarft7fjhlgvtkhekbm
more »
... ved the efficiency of the approach in solving dynamic problems with a number of competing peaks.
« Previous
Showing results 1 — 15 out of 25 results