40,166 Hits in 4.5 sec

Understanding intermediate layers using linear classifier probes [article]

Guillaume Alain, Yoshua Bengio
2018 arXiv   pre-print
We use linear classifiers, which we refer to as "probes", trained entirely independently of the model itself. This helps us better understand the roles and dynamics of the intermediate layers.  ...  We propose to monitor the features at every layer of a model and measure how suitable they are for classification.  ...  Rather, we want to demonstrate the usefulness of the linear classifier probes as a way to better understand what is happening in their deep networks.  ... 
arXiv:1610.01644v4 fatcat:3vl3jn7idzeaxnegy3r7sq3ufu

Interpreting Intentionally Flawed Models with Linear Probes

Mara Graziani, Henning Muller, Vincent Andrearczyk
2019 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)  
In this paper, we probe the activations of intermediate layers with linear classification and regression.  ...  Linear classifier probes could be used to further confirm this hypothesis.  ...  Linear classifier probes measure the linear separability of the classes at intermediate layers of the DNN. CAVs interpret the DNN internal state in terms of human-friendly concepts.  ... 
doi:10.1109/iccvw.2019.00096 dblp:conf/iccvw/GrazianiMA19 fatcat:4vuhgrvedvdrfbcotcbp67bqdi

Undivided Attention: Are Intermediate Layers Necessary for BERT? [article]

Sharath Nittur Sridhar, Anthony Sarah
2020 arXiv   pre-print
Additionally, we use the central kernel alignment (CKA) similarity metric and probing classifiers to demonstrate that removing intermediate layers has little impact on the learned self-attention representations  ...  However, a strong justification for the inclusion of these intermediate layers remains missing in the literature.  ...  We then use attention-based probing classifiers [7] to analyze the contributions of the intermediate blocks.  ... 
arXiv:2012.11881v1 fatcat:6sieqvm64vbf5oxwfyoxhn4y3m

On the Hierarchical Information in a Single Contextualised Word Representation (Student Abstract)

Dean L. Slack, Mariann Hardey, Noura Al Moubayed
To assess the presence of hierarchical information throughout the networks, the linear classifiers are trained using representations produced by each intermediate layer of BERT and ELMo variants.  ...  Using labelled constituency trees, we train simple linear classifiers on top of single contextualised word representations for ancestor sentiment analysis tasks at multiple constituency levels of a sentence  ...  II) Improved understanding of layer-wise performance over different constituent levels of a sentence can be used to make informed decisions regarding the appropriate selection of embedding layer to use  ... 
doi:10.1609/aaai.v34i10.7231 fatcat:itjoahgt5zgzdk7hnfcsshctuq

Probing Classifiers: Promises, Shortcomings, and Advances [article]

Yonatan Belinkov
2021 arXiv   pre-print
This article critically reviews the probing classifiers framework, highlighting their promises, shortcomings, and advances.  ...  The basic idea is simple -- a classifier is trained to predict some linguistic property from a model's representations -- and has been used to examine a wide variety of models and properties.  ...  supervision on intermediate layers.  ... 
arXiv:2102.12452v4 fatcat:x7qfinepf5hydkbiba3qvbfeti

What do pre-trained code models know about code? [article]

Anjan Karmakar, Romain Robbes
2021 arXiv   pre-print
We show how probes can be used to identify whether models are deficient in (understanding) certain code properties, characterize different model layers, and get insight into the model sample-efficiency  ...  In this paper, we construct four probing tasks (probing for surface-level, syntactic, structural, and semantic information) for pre-trained code models.  ...  Importantly, the probing classifier, which is usually a linear classifier, is simple with no hidden layers of its own.  ... 
arXiv:2108.11308v1 fatcat:jhpp35jgwnaajdvxz5owekurly

On the Interplay Between Fine-tuning and Sentence-level Probing for Linguistic Knowledge in Pre-trained Transformers [article]

Marius Mosbach, Anna Khokhlova, Michael A. Hedderich, Dietrich Klakow
2020 arXiv   pre-print
positive effect on probing accuracy that is larger than just using the pre-trained model with a strong pooling method.  ...  Our analysis reveals that while fine-tuning indeed changes the representations of a pre-trained model and these changes are typically larger for higher layers, only in very few cases, fine-tuning has a  ...  We would also like to thank the reviewers for their useful comments and feedback, in particular R1.  ... 
arXiv:2010.02616v1 fatcat:2fqphct7hva5jhjxfadolkutwi

Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation [article]

Po-Han Chi, Pei-Hung Chung, Tsung-Han Wu, Chun-Cheng Hsieh, Yen-Hao Chen, Shang-Wen Li, Hung-yi Lee
2021 arXiv   pre-print
In probing experiments, we find that the latent representations encode richer information of both phoneme and speaker than that of the last layer.  ...  Moreover, we use some simple probing models to measure how much the information of the speaker and phoneme is encoded in latent representations.  ...  We use a simple classifier to probe each layer, and we find that the representations of the intermediate layers contain more phonetic and speaker information than that of the last layer.  ... 
arXiv:2005.08575v5 fatcat:yvlysgrruff4foptjtapsqb3ym

A Closer Look at How Fine-tuning Changes BERT [article]

Yichu Zhou, Vivek Srikumar
2022 arXiv   pre-print
In this work, we study the English BERT family and use two probing techniques to analyze how fine-tuning changes the space.  ...  Given the prevalence of pre-trained contextualized representations in today's NLP, there have been many efforts to understand what information they contain, and why they seem to be universally successful  ...  ., 2020) starts with using linear classifiers as the probe. Hewitt and Liang (2019) pointed out that a linear probe is not sufficient to evaluate a representation.  ... 
arXiv:2106.14282v3 fatcat:zsv36n3akra7zovoyo4rjjwkf4

OOD-Probe: A Neural Interpretation of Out-of-Domain Generalization [article]

Zining Zhu, Soroosh Shahtalebi, Frank Rudzicz
2022 arXiv   pre-print
Instead, we propose a flexible framework that evaluates OOD systems with finer granularity using a probing module that predicts the originating domain from intermediate representations.  ...  For example, the information about rotation (on RotatedMNIST) is the most visible on the lower layers, while the information about style (on VLCS and PACS) is the most visible on the middle layers.  ...  Setting up the probing module The most popular type of classifier in the probing literature is a fully-connected linear classifier.  ... 
arXiv:2208.12352v1 fatcat:v75f7o5winc5zklrqygyu4xldy

Adversarial TCAV – Robust and Effective Interpretation of Intermediate Layers in Neural Networks [article]

Rahul Soni, Naresh Shah, Chua Tat Seng, Jimmy D. Moore
2020 arXiv   pre-print
Interpreting neural network decisions and the information learned in intermediate layers is still a challenge due to the opaque internal state and shared non-linear interactions.  ...  For robustness, we define it as the ability of an intermediate layer to be consistent in its recall rate (the effectiveness) for different random seeds.  ...  Conclusion & Future Scope TCAV is an excellent approach to probe intermediate layers of the neural network and gain human-level understanding of what concept a layer has learned.  ... 
arXiv:2002.03549v2 fatcat:strmcnjwdzgntot6wjlzwklvna

What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis [article]

Shammur Absar Chowdhury, Nadir Durrani, Ahmed Ali
2021 arXiv   pre-print
Using diagnostic classifiers, we answered these questions.  ...  We no longer understand what features are learned, where they are preserved, and how they inter-operate.  ...  We use a framework based on probing classifiers [1, 49] .  ... 
arXiv:2107.00439v1 fatcat:6tnntz35nne5hacaxynrr5grdu

Do Vision Transformers See Like Convolutional Neural Networks? [article]

Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, Alexey Dosovitskiy
2022 arXiv   pre-print
Finally, we study the effect of (pretraining) dataset scale on intermediate features and transfer learning, and conclude with a discussion on connections to new architectures such as the MLP-Mixer.  ...  internal representation structure of ViTs and CNNs on image classification benchmarks, we find striking differences between the two architectures, such as ViT having more uniform representations across all layers  ...  classification with linear probes.We do this across different layers of the model, training linear probes to classify image label with closed-form few-shot linear regression similar to Dosovitskiy et  ... 
arXiv:2108.08810v2 fatcat:nju5i5wbbncavpit3pi2gcbsoe

What Does an End-to-End Dialect Identification Model Learn About Non-Dialectal Information?

Shammur A. Chowdhury, Ahmed Ali, Suwon Shon, James Glass
2020 Interspeech 2020  
Our findings also suggest that the CNN layers of the end-to-end model mirror feature extractors capturing voice-specific information, while the fullyconnected layers encode more dialectal information.  ...  We design several proxy tasks to understand the model's ability to represent speech input for differentiating non-dialectal information -such as (a) gender and voice identity of speakers, (b) languages  ...  The performance of the proxy tasks used to probe the intermediate layer representations of ADI-17 models are presented in Table 1 and Table 2.  ... 
doi:10.21437/interspeech.2020-2235 dblp:conf/interspeech/ChowdhuryASG20 fatcat:j47ar7t7jjdafmsrb6mnhjfjte

Probing for the Usage of Grammatical Number [article]

Karim Lasri, Tiago Pimentel, Alessandro Lenci, Thierry Poibeau, Ryan Cotterell
2022 arXiv   pre-print
In this paper, we try to find encodings that the model actually uses, introducing a usage-based probing setup.  ...  Finally, we identify in which layers information about grammatical number is transferred from a noun to its head verb.  ...  Many researchers have expressed a preference for linear classifiers in probing (Alain and Bengio, 2016; Ettinger et al., 2016; Hewitt and Manning, 2019) , suggesting that a less complex classifier gives  ... 
arXiv:2204.08831v2 fatcat:dagkuyn4y5blrk4fdcpglf4imu
« Previous Showing results 1 — 15 out of 40,166 results