Filters








248 Hits in 1.9 sec

Tangent: Automatic Differentiation Using Source Code Transformation in Python [article]

Bart van Merriënboer, Alexander B. Wiltschko, Dan Moldovan
2017 arXiv   pre-print
Automatic differentiation (AD) is an essential primitive for machine learning programming systems. Tangent is a new library that performs AD using source code transformation (SCT) in Python. It takes numeric functions written in a syntactic subset of Python and NumPy as input, and generates new Python functions which calculate a derivative. This approach to automatic differentiation is different from existing packages popular in machine learning, such as TensorFlow and Autograd. Advantages are
more » ... hat Tangent generates gradient code in Python which is readable by the user, easy to understand and debug, and has no runtime overhead. Tangent also introduces abstractions for easily injecting logic into the generated gradient code, further improving usability.
arXiv:1711.02712v1 fatcat:4ewylp4uivcsfh6dmjrf2nze5y

Blocks and Fuel: Frameworks for deep learning [article]

Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, Yoshua Bengio
2015 arXiv   pre-print
We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel. Blocks is based on Theano, a linear algebra compiler with CUDA-support. It facilitates the training of complex neural network models by providing parametrized Theano operations, attaching metadata to Theano's symbolic computational graph, and providing an extensive set of utilities to assist training the networks, e.g. training algorithms, logging, monitoring, visualization, and serialization. Fuel
more » ... ovides a standard format for machine learning datasets. It allows the user to easily iterate over large datasets, performing many types of pre-processing on the fly.
arXiv:1506.00619v1 fatcat:xp6wxgav4rcg5pgzityagso63i

Multiscale sequence modeling with a learned dictionary [article]

Bart van Merriënboer, Amartya Sanyal, Hugo Larochelle, Yoshua Bengio
2017 arXiv   pre-print
We propose a generalization of neural network sequence models. Instead of predicting one symbol at a time, our multi-scale model makes predictions over multiple, potentially overlapping multi-symbol tokens. A variation of the byte-pair encoding (BPE) compression algorithm is used to learn the dictionary of tokens that the model is trained with. When applied to language modelling, our model has the flexibility of character-level models while maintaining many of the performance benefits of
more » ... vel models. Our experiments show that this model performs better than a regular LSTM on language modeling tasks, especially for smaller models.
arXiv:1707.00762v2 fatcat:qgwajxvtv5bpzo3zqylpjbthei

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches [article]

Kyunghyun Cho and Bart van Merrienboer and Dzmitry Bahdanau and Yoshua Bengio
2014 arXiv   pre-print
Neural machine translation is a relatively new approach to statistical machine translation based purely on neural networks. The neural machine translation models often consist of an encoder and a decoder. The encoder extracts a fixed-length representation from a variable-length input sentence, and the decoder generates a correct translation from this representation. In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder--Decoder and a
more » ... newly proposed gated recursive convolutional neural network. We show that the neural machine translation performs relatively well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase. Furthermore, we find that the proposed gated recursive convolutional network learns a grammatical structure of a sentence automatically.
arXiv:1409.1259v2 fatcat:i2l2qmkyfjakdmu7663edwfnaq

GradMax: Growing Neural Networks using Gradient Information [article]

Utku Evci, Bart van Merriënboer, Thomas Unterthiner, Max Vladymyrov, Fabian Pedregosa
2022 arXiv   pre-print
The architecture and the parameters of neural networks are often optimized independently, which requires costly retraining of the parameters whenever the architecture is modified. In this work we instead focus on growing the architecture without requiring costly retraining. We present a method that adds new neurons during training without impacting what is already learned, while improving the training dynamics. We achieve the latter by maximizing the gradients of the new weights and find the
more » ... imal initialization efficiently by means of the singular value decomposition (SVD). We call this technique Gradient Maximizing Growth (GradMax) and demonstrate its effectiveness in variety of vision tasks and architectures.
arXiv:2201.05125v3 fatcat:s6hpjlhhs5cs5nyqjgop7mov24

Tangent: Automatic differentiation using source-code transformation for dynamically typed array programming [article]

Bart van Merriënboer, Dan Moldovan, Alexander B Wiltschko
2018 arXiv   pre-print
The need to efficiently calculate first- and higher-order derivatives of increasingly complex models expressed in Python has stressed or exceeded the capabilities of available tools. In this work, we explore techniques from the field of automatic differentiation (AD) that can give researchers expressive power, performance and strong usability. These include source-code transformation (SCT), flexible gradient surgery, efficient in-place array operations, higher-order derivatives as well as
more » ... of forward and reverse mode AD. We implement and demonstrate these ideas in the Tangent software library for Python, the first AD framework for a dynamic language that uses SCT.
arXiv:1809.09569v2 fatcat:4rdr3fcaxnbk5hdjumz4il4gyu

Improving supervision for students at a distance: videoconferencing for group meetings

Karen D. Könings, Daniela Popa, Maike Gerken, Bas Giesbers, Bart C. Rienties, Cees P.M. van der Vleuten, Jeroen J.G. van Merriënboer
2015 Innovations in Education & Teaching International  
Satisfaction is closely related to motivation for learning (Könings, Brand-Gruwel, & van Merriënboer, 2011) and the likelihood of finishing the thesis (Ives & Rowley, 2005) .  ...  This bias was unavoidable as for educational innovations you depend on staff members who are motivated to leave routine procedures and try something new (Fullan, 2007; Könings, Brand-Gruwel, & van Merriënboer  ... 
doi:10.1080/14703297.2015.1004098 fatcat:xv57gobtz5c3fpddd2xooq5dgq

Automatic differentiation in ML: Where we are and where we should be going [article]

Bart van Merriënboer, Olivier Breuleux, Arnaud Bergeron, Pascal Lamblin
2019 arXiv   pre-print
Author contributions and acknowledgements Bart van Merriënboer worked on the design and implementation of the IR, as well as the design of the AD system.  ... 
arXiv:1810.11530v2 fatcat:g2chgpagsvhn5daeka26diuwle

On the interplay between noise and curvature and its effect on optimization and generalization [article]

Valentin Thomas, Fabian Pedregosa, Bart van Merriënboer, Pierre-Antoine Mangazol, Yoshua Bengio, Nicolas Le Roux
2020 arXiv   pre-print
The speed at which one can minimize an expected loss using stochastic methods depends on two properties: the curvature of the loss and the variance of the gradients. While most previous works focus on one or the other of these properties, we explore how their interaction affects optimization speed. Further, as the ultimate goal is good generalization performance, we clarify how both curvature and noise are relevant to properly estimate the generalization gap. Realizing that the limitations of
more » ... me existing works stems from a confusion between these matrices, we also clarify the distinction between the Fisher matrix, the Hessian, and the covariance matrix of the gradients.
arXiv:1906.07774v2 fatcat:gbqfmylv75bs7nhyjjc3jsd7wy

On the Properties of Neural Machine Translation: Encoder–Decoder Approaches

Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio
2014 Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation  
Neural machine translation is a relatively new approach to statistical machine translation based purely on neural networks. The neural machine translation models often consist of an encoder and a decoder. The encoder extracts a fixed-length representation from a variable-length input sentence, and the decoder generates a correct translation from this representation. In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder-Decoder and a
more » ... ewly proposed gated recursive convolutional neural network. We show that the neural machine translation performs relatively well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase. Furthermore, we find that the proposed gated recursive convolutional network learns a grammatical structure of a sentence automatically.
doi:10.3115/v1/w14-4012 dblp:conf/ssst/ChoMBB14 fatcat:zogr4hmywfetnfv4fk3pwho6di

Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation [article]

Jean Pouget-Abadie and Dzmitry Bahdanau and Bart van Merrienboer and Kyunghyun Cho and Yoshua Bengio
2014 arXiv   pre-print
The authors of (Cho et al., 2014a) have shown that the recently introduced neural network translation systems suffer from a significant drop in translation quality when translating long sentences, unlike existing phrase-based translation systems. In this paper, we propose a way to address this issue by automatically segmenting an input sentence into phrases that can be easily translated by the neural network translation model. Once each segment has been independently translated by the neural
more » ... hine translation model, the translated clauses are concatenated to form a final translation. Empirical results show a significant improvement in translation quality for long sentences.
arXiv:1409.1257v2 fatcat:2ixsjh6qvzby7jcznupx6pyqkq

Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio
2014 Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)  
The projection was done by the recently proposed Barnes-Hut-SNE (van der Maaten, 2013) .  ... 
doi:10.3115/v1/d14-1179 dblp:conf/emnlp/ChoMGBBSB14 fatcat:uiy743kyojcknh7pjgs4x33osa

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation [article]

Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio
2014 arXiv   pre-print
The projection was done by the recently proposed Barnes-Hut-SNE (van der Maaten, 2013) .  ... 
arXiv:1406.1078v3 fatcat:5gl2ci3wbnagzgbe5mtlqh6guu

Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation

Jean Pouget-Abadie, Dzmitry Bahdanau, Bart van Merrienboer, Kyunghyun Cho, Yoshua Bengio
2014 Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation  
The authors of (Cho et al., 2014a) have shown that the recently introduced neural network translation systems suffer from a significant drop in translation quality when translating long sentences, unlike existing phrase-based translation systems. In this paper, we propose a way to address this issue by automatically segmenting an input sentence into phrases that can be easily translated by the neural network translation model. Once each segment has been independently translated by the neural
more » ... hine translation model, the translated clauses are concatenated to form a final translation. Empirical results show a significant improvement in translation quality for long sentences.
doi:10.3115/v1/w14-4009 dblp:conf/ssst/Pouget-AbadieBM14 fatcat:yj4oop4sxfcxxecfbez3exdcc4

Halting Time is Predictable for Large Models: A Universality Property and Average-case Analysis [article]

Courtney Paquette, Bart van Merriënboer, Elliot Paquette, Fabian Pedregosa
2021 arXiv   pre-print
Average-case analysis computes the complexity of an algorithm averaged over all possible inputs. Compared to worst-case analysis, it is more representative of the typical behavior of an algorithm, but remains largely unexplored in optimization. One difficulty is that the analysis can depend on the probability distribution of the inputs to the model. However, we show that this is not the case for a class of large-scale problems trained with first-order methods including random least squares and
more » ... ne-hidden layer neural networks with random weights. In fact, the halting time exhibits a universality property: it is independent of the probability distribution. With this barrier for average-case analysis removed, we provide the first explicit average-case convergence rates showing a tighter complexity not captured by traditional worst-case analysis. Finally, numerical simulations suggest this universality property holds for a more general class of algorithms and problems.
arXiv:2006.04299v3 fatcat:25fyuxwklrhxjfll2hmnt3cgfy
« Previous Showing results 1 — 15 out of 248 results