Filters








1,361 Hits in 6.4 sec

Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks [article]

R. Thomas McCoy, Robert Frank, Tal Linzen
2020 arXiv   pre-print
In neural network models, inductive biases could in theory arise from any aspect of the model architecture.  ...  However, the only factor that consistently contributed a hierarchical bias across tasks was the use of a tree-structured model rather than a model with sequential recurrence, suggesting that human-like  ...  Pavia Center for Neurocognition, Epistemology, and Theoretical Syntax, the Penn State Dept. of Computer Science and Engineering, and the MIT Dept. of Brain and Cognitive Sciences.  ... 
arXiv:2001.03632v1 fatcat:imioa7x2zbhcpc5yze2jw2u4n4

Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-Sequence Networks

R. Thomas McCoy, Robert Frank, Tal Linzen
2020 Transactions of the Association for Computational Linguistics  
In neural network models, inductive biases could in theory arise from any aspect of the model architecture.  ...  However, the only factor that consistently contributed a hierarchical bias across tasks was the use of a tree-structured model rather than a model with sequential recurrence, suggesting that human-like  ...  Pavia Center for Neurocognition, Epistemology, and Theoretical Syntax, the Penn State Department of Computer Science and Engineering, and the MIT Department of Brain and Cognitive Sciences.  ... 
doi:10.1162/tacl_a_00304 fatcat:rwvmrt6ofzdmhomlhkpvl4q6ni

Tree Transformer: Integrating Tree Structures into Self-Attention [article]

Yau-Shian Wang and Hung-Yi Lee and Yun-Nung Chen
2019 arXiv   pre-print
This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures.  ...  With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning  ...  This work was financially supported from the Young Scholar Fellowship Program by Ministry of Science and Technology (MOST) in Taiwan, under Grant 108-2636-E002-003.  ... 
arXiv:1909.06639v2 fatcat:46z2iyuevrhhdfmgpxupnqyt5m

Tree Transformer: Integrating Tree Structures into Self-Attention

Yaushian Wang, Hung-Yi Lee, Yun-Nung Chen
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
This paper proposes Tree Transformer, which adds an extra constraint to attention heads of the bidirectional Transformer encoder in order to encourage the attention heads to follow tree structures.  ...  With the same training procedure identical to BERT, the experiments demonstrate the effectiveness of Tree Transformer in terms of inducing tree structures, better language modeling, and further learning  ...  This work was financially supported from the Young Scholar Fellowship Program by Ministry of Science and Technology (MOST) in Taiwan, under Grant 108-2636-E002-003.  ... 
doi:10.18653/v1/d19-1098 dblp:conf/emnlp/WangLC19 fatcat:4bibiy4llrhjfd7jozktclks6q

Recursive Top-Down Production for Sentence Generation with Latent Trees [article]

Shawn Tan and Yikang Shen and Timothy J. O'Donnell and Alessandro Sordoni and Aaron Courville
2020 arXiv   pre-print
To this end, we present a dynamic programming algorithm that marginalises over latent binary tree structures with N leaves, allowing us to compute the likelihood of a sequence of N tokens under a latent  ...  tree model, which we maximise to train a recursive neural function.  ...  Does syntax need to grow on trees? sources of hierarchical inductive bias in sequence-to-sequence networks. arXiv preprint arXiv:2001.03632. Daichi Mochihashi and Eiichiro Sumita. 2008.  ... 
arXiv:2010.04704v1 fatcat:ykx4tqlfurg6ndb5kgh6oxyca4

Tree-structured decoding with doubly-recurrent neural networks

David Alvarez-Melis, Tommi S. Jaakkola
2017 International Conference on Learning Representations  
The experimental results show the effectiveness of this architecture at recovering latent tree structure in sequences and at mapping sentences to simple functional programs.  ...  That is, in response to an encoded vector representation, co-evolving recurrences are used to realize the associated tree and the labels for the nodes in the tree.  ...  The authors would like to thank the anonymous reviewers for their constructive comments.  ... 
dblp:conf/iclr/Alvarez-MelisJ17 fatcat:3gdog2vvqfa57lbookf2jdz55i

BP-Transformer: Modelling Long-Range Context via Binary Partitioning [article]

Zihao Ye, Qipeng Guo, Quan Gan, Xipeng Qiu, Zheng Zhang
2019 arXiv   pre-print
In this paper, adopting a fine-to-coarse attention mechanism on multi-scale spans via binary partitioning (BP), we propose BP-Transformer (BPT for short).  ...  The Transformer model is widely successful on many natural language processing tasks. However, the quadratic complexity of self-attention limit its application on long text.  ...  the commonsensible inductive bias of language, such as sequential or syntax structure.  ... 
arXiv:1911.04070v1 fatcat:bmwoc42rfjdznluapxmsftljdm

POETREE: Interpretable Policy Learning with Adaptive Decision Trees [article]

Alizée Pace, Alex J. Chan, Mihaela van der Schaar
2022 arXiv   pre-print
This policy learning method outperforms the state-of-the-art on real and synthetic medical datasets, both in terms of understanding, quantifying and evaluating observed behaviour as well as in accurately  ...  in decision tree policies that adapt over time with patient information.  ...  Many thanks to group members of the van der Schaar Lab and to our reviewers for their valuable feedback.  ... 
arXiv:2203.08057v1 fatcat:3ga2hdoczrdcbmke7g25pf7j4q

What they do when in doubt: a study of inductive biases in seq2seq learners [article]

Eugene Kharitonov, Rahma Chaabouni
2021 arXiv   pre-print
Further, we connect to Solomonoff's theory of induction and propose to use description length as a principled and sensitive measure of inductive biases.  ...  Furthermore, Transformer and LSTM-based learners show a bias toward the hierarchical induction over the linear one, while CNN-based learners prefer the opposite.  ...  In particular, we apply the description length metric to investigate learners' biases toward the mem and comp rules that explain the training examples in our setup.  ... 
arXiv:2006.14953v2 fatcat:gguete7rnnbzdlqmuppggpfvii

Can RNNs trained on harder subject-verb agreement instances still perform well on easier ones? [article]

Hritik Bansal, Gantavya Bhatt, Sumeet Agarwal
2021 arXiv   pre-print
In this work, we investigate RNN models with varying inductive biases trained on selectively chosen 'hard' agreement instances, i.e., sentences with at least one agreement attractor.  ...  However, we observe that several RNN types, including the ONLSTM which has a soft structural inductive bias, surprisingly fail to perform well on sentences without attractors when trained solely on sentences  ...  Does syntax need to grow on trees? Aaron Mueller, Garrett Nicolai, Panayiota Petrou- Zeniou, Natalia Talmina, and Tal Linzen. 2020.  ... 
arXiv:2010.04976v2 fatcat:ivbyjydq6re5lcea3iufpgm6b4

Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges [article]

Triet H. M. Le, Hao Chen, M. Ali Babar
2020 arXiv   pre-print
To facilitate further research and applications of DL in this field, we provide a comprehensive review to categorize and investigate existing DL methods for source code modeling and generation.  ...  To address the limitations of the traditional source code models, we formulate common program learning tasks under an encoder-decoder framework.  ...  Other studies proposed DL models (Recursive Neural Networks [267] , Tree-LSTM [263] or CNN [185] ) to work directly on the hierarchical structure of a parse tree. Recently, Zhang et al.  ... 
arXiv:2002.05442v1 fatcat:bt7dtzrcnjfk5jn6kmin2ruqii

Amanuensis: The Programmer's Apprentice [article]

Thomas Dean, Maurice Chiang, Marcus Gomez, Nate Gruver, Yousef Hindy, Michelle Lam, Peter Lu, Sophia Sanchez, Rohun Saxena, Michael Smith, Lucy Wang, Catherine Wong
2018 arXiv   pre-print
This document provides an overview of the material covered in a course taught at Stanford in the spring quarter of 2018.  ...  The course draws upon insight from cognitive and systems neuroscience to implement hybrid connectionist and symbolic reasoning systems that leverage and extend the state of the art in machine learning  ...  What does the programmer know and what does she need to be told in order to provide you with assistance?  ... 
arXiv:1807.00082v2 fatcat:piwexqa2xvgg5ec5xwkswstswy

Machine Learning-based Analysis of Program Binaries: A Comprehensive Study

Hongfa Xue, Shaowen Sun, Guru Venkataramani, Tian Lan
2019 IEEE Access  
To meet these challenges, machine learning-based binary code analysis frameworks attract substantial attention due to their automated feature extraction and drastically reduced efforts needed on large-scale  ...  In this paper, we provide the taxonomy of machine learning-based binary code analysis, describe the recent advances and key findings on the topic, and discuss the key challenges and opportunities.  ...  As mentioned in the previous section, source codes leverage rich structural information such as syntax trees and variable names made available through source lines of program code comparing to binary codes  ... 
doi:10.1109/access.2019.2917668 fatcat:fwjpykkdpjev7pzkhaoily4zci

Natural Language Processing, Electronic Health Records, and Clinical Research [chapter]

Feifan Liu, Chunhua Weng, Hong Yu
2012 Health Informatics Series  
With the increasingly broadening adoption of EHR worldwide, there is a growing need to widen the use of EHR data to support clinical research.  ...  With the increasingly broadening adoption of EHR worldwide, there is a growing need to widen the use of EHR data to support clinical research [ 2 ] .  ...  The induction of a decision tree is a top-down process to reduce information content by mapping them to fewer outputs but seek a trade-off between accuracy and simplicity.  ... 
doi:10.1007/978-1-84882-448-5_16 fatcat:qz5jc3lgunf5rbukasurbbn4oq

The learnability of abstract syntactic principles

Amy Perfors, Joshua B. Tenenbaum, Terry Regier
2011 Cognition  
These generalizations must be guided by some inductive bias -some abstract knowledge -that leads them to prefer the correct hypotheses even in the absence of directly supporting evidence.  ...  hierarchical phrase structures rather than linear sequences of words (e.g., Chomsky, 1965 Chomsky, , 1971 Chomsky, , 1980 Crain & Nakayama, 1987) .  ...  have an initial bias to treat syntax as a system of rules for mapping between thoughts and sequences of sounds, then this could effectively amount to an implicit bias for hierarchical phrase structure  ... 
doi:10.1016/j.cognition.2010.11.001 pmid:21186021 fatcat:c4c6r6degjenviogz76oapkv5y
« Previous Showing results 1 — 15 out of 1,361 results