A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit <a rel="external noopener" href="https://arxiv.org/pdf/2106.04935v1.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<span class="release-stage" >pre-print</span>
Neural Transfer Learning (TL) is becoming ubiquitous in Natural Language Processing (NLP), thanks to its high performance on many tasks, especially in low-resourced scenarios. Notably, TL is widely used for neural domain adaptation to transfer valuable knowledge from high-resource to low-resource domains. In the standard fine-tuning scheme of TL, a model is initially pre-trained on a source domain and subsequently fine-tuned on a target domain and, therefore, source and target domains are<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.04935v1">arXiv:2106.04935v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/blerqmjokfc73insvdudxnpzhy">fatcat:blerqmjokfc73insvdudxnpzhy</a> </span>
more »... d using the same architecture. In this paper, we show through interpretation methods that such scheme, despite its efficiency, is suffering from a main limitation. Indeed, although capable of adapting to new domains, pre-trained neurons struggle with learning certain patterns that are specific to the target domain. Moreover, we shed light on the hidden negative transfer occurring despite the high relatedness between source and target domains, which may mitigate the final gain brought by transfer learning. To address these problems, we propose to augment the pre-trained model with normalised, weighted and randomly initialised units that foster a better adaptation while maintaining the valuable source knowledge. We show that our approach exhibits significant improvements to the standard fine-tuning scheme for neural domain adaptation from the news domain to the social media domain on four NLP tasks: part-of-speech tagging, chunking, named entity recognition and morphosyntactic tagging.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210612041306/https://arxiv.org/pdf/2106.04935v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/3a/35/3a35958d6ab66f965e20312390462aa60e54dc78.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.04935v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>