A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Adversarial Learning for Chinese NER from Crowd Annotations
[article]
2018
arXiv
pre-print
In this paper, we propose an approach to performing crowd annotation learning for Chinese Named Entity Recognition (NER) to make full use of the noisy sequence labels from multiple annotators. ...
Inspired by adversarial learning, our approach uses a common Bi-LSTM and a private Bi-LSTM for representing annotator-generic and -specific information. ...
Conclusions In this paper, we presented an approach to performing crowd annotation learning based on the idea of adversarial training for Chinese Named Entity Recognition (NER). ...
arXiv:1801.05147v1
fatcat:dl3kqnrfizeyzo3qdunr2arvfe
Multifeature Named Entity Recognition in Information Security Based on Adversarial Learning
2019
Security and Communication Networks
to obtain labelled data from crowd annotations. ...
We use the generative adversarial network to find common features in crowd annotations and then consider them in conjunction with the domain dictionary feature and sentence dependency feature as additional ...
Yang et al. demonstrated the usability of the GAN model for NER using Chinese crowd-sourced annotations [19] . ...
doi:10.1155/2019/6417407
fatcat:gom7nevw3bdgheqcqvzyvjhtg4
Leveraging Part-of-Speech Tagging Features and a Novel Regularization Strategy for Chinese Medical Named Entity Recognition
2022
Mathematics
Chinese Medical Named Entity Recognition (Chinese-MNER) aims to identify potential entities and their categories from the unstructured Chinese medical text. ...
What is more, the limited amount of annotated Chinese-MNER data can easily lead to the over-fitting problem while training. ...
The cMedQANER and cEHRNER datasets are annotated from the Chinese community question answering and the Chinese electronic health records, respectively. ...
doi:10.3390/math10091386
doaj:511970fd72de4e3da5f1d32fcafd9cf0
fatcat:cb47j3wcmjblfjjyhdptbodeyq
CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition
[article]
2020
arXiv
pre-print
Therefore, Chinese Word Segmentation (CWS) is usually considered as the first step for Chinese NER. ...
In this paper, we investigate a Convolutional Attention Network called CAN for Chinese NER, which consists of a character-based convolutional neural network (CNN) with local-attention layer and a gated ...
[49] leveraged character-level BiLSTM to extract higherlevel features from crowd-annotations.
V. ...
arXiv:1904.02141v3
fatcat:qxiwyexmpzeh7cliven2zxoeem
Multi-Source Cross-Lingual Model Transfer: Learning What to Share
[article]
2019
arXiv
pre-print
Cross-lingual transfer learning (CLTL) is a viable method for building NLP models for a low-resource target language by leveraging labeled data from other (source) languages. ...
Such deep neural models, however, are not applicable to most human languages due to the lack of annotated training data for various NLP tasks. ...
Denote the annotated corpus for a source language l ∈ S as X l , where (x, y) ∼ X l is a sample drawn from X l . ...
arXiv:1810.03552v3
fatcat:mim5ubvnv5ghdkqk3xeyhqfhfm
Multi-Source Cross-Lingual Model Transfer: Learning What to Share
2019
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Cross-lingual transfer learning (CLTL) is a viable method for building NLP models for a low-resource target language by leveraging labeled data from other (source) languages. ...
Such deep neural models, however, are not applicable to most human languages due to the lack of annotated training data for various NLP tasks. ...
As it is prohibitive to obtain training data for all languages of interest, crosslingual transfer learning (CLTL) offers the possibility of learning models for a target language using annotated data from ...
doi:10.18653/v1/p19-1299
dblp:conf/acl/ChenAHWC19
fatcat:d2l42ilxxzbitjyfl2haontvgq
DuQM: A Chinese Dataset of Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models
[article]
2022
arXiv
pre-print
For this purpose, we create a Chinese dataset namely DuQM which contains natural questions with linguistic perturbations to evaluate the robustness of question matching models. ...
In this paper, we focus on studying robustness evaluation of Chinese question matching. ...
Since all annotators are linguistic experts from our internal data team instead of crowd-sourcing, we do not need to use IAA to measure the annotation quality. ...
arXiv:2112.08609v2
fatcat:zeupt4upp5a3ziwsrdjqx4kk2i
Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need!
[article]
2018
arXiv
pre-print
the loss from cross-lingual transfer. ...
We then compare (i) annotation projection and (ii) bilingual word embeddings based direct transfer strategies for cross-lingual AM, finding that the former performs considerably better and almost eliminates ...
As a result, acquiring (high-quality) datasets for new languages comes at a high costbe it in terms of training and/or hiring expert annotators or querying large crowds in crowd-sourcing experiments. ...
arXiv:1807.08998v1
fatcat:2o67o3iklncz7dlovd7zkow25q
A survey in Adversarial Defences and Robustness in NLP
[article]
2022
arXiv
pre-print
These methods are not just used for defending neural networks from adversarial attacks, but also used as a regularization mechanism during training, saving the model from overfitting. ...
Strong adversarial attacks are proposed by various authors for computer vision and Natural Language Processing (NLP). ...
NER techniques fall into four major categories, (i) Rule based NER, which works on handcrafted rules (ii) Unsupervised learning approaches, which works on unsupervised learning approaches such as clustering ...
arXiv:2203.06414v2
fatcat:2ukd44px35e7ppskzkaprfw4ha
Cyberspace Security Using Adversarial Learning and Conformal Prediction
2015
Intelligent Information Management
This paper advances new directions for cyber security using adversarial learning and conformal prediction in order to enhance network and computing services defenses against adaptive, malicious, persistent ...
The motivation for using conformal prediction and its immediate off-spring, those of semi-supervised learning and transduction, comes from them first and foremost supporting discriminative and non-parametric ...
There is offense and there is defense with both attempting to guess and learn from each other. This is the core for adversarial learning. ...
doi:10.4236/iim.2015.74016
fatcat:wqiu3pkl6zeurlr3mizdahhgd4
RuBQ: A Russian Dataset for Question Answering over Wikidata
[article]
2020
arXiv
pre-print
The dataset creation started with a large collection of question-answer pairs from online quizzes. ...
The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification. ...
We are grateful to Yandex.Toloka for their data annotation grant. ...
arXiv:2005.10659v1
fatcat:4fyptlackrafngtbehb4mlmkhm
RuBQ: A Russian Dataset for Question Answering over Wikidata
[chapter]
2020
Lecture Notes in Computer Science
The dataset creation started with a large collection of question-answer pairs from online quizzes. ...
The proposed dataset generation pipeline proved to be efficient and can be employed in other data annotation projects. ...
We are grateful to Yandex.Toloka for their data annotation grant. ...
doi:10.1007/978-3-030-62466-8_7
fatcat:bo2c5mp7unhhhbdxkuzfv5ujpy
A Vietnamese Dataset for Evaluating Machine Reading Comprehension
[article]
2020
arXiv
pre-print
Besides, we conduct experiments on state-of-the-art MRC methods for English and Chinese as the first experimental models on UIT-ViQuAD. ...
We also estimate human performance on the dataset and compare it to the experimental results of powerful machine learning models. ...
., 2018) was released with adding over 50,000 unanswerable questions created adversarially by crowd-workers according to the original ones. ...
arXiv:2009.14725v3
fatcat:t7de6vsxcjgf7ey6xtt5mtn4qy
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
[article]
2020
arXiv
pre-print
Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. ...
To this end, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual ...
Acknowledgements We'd like to thank Jon Clark for sharing with us the TyDiQA Gold Passage data and for valuable feedback. ...
arXiv:2003.11080v5
fatcat:uplhdbuxgrfszpl7nrn5esbpoi
Deep learning based question answering system in Bengali
2020
Journal of Information and Telecommunication
We collect a smaller human annotated QA dataset from Bengali Wikipedia with popular topics from Bangladeshi culture for evaluating our models. ...
Unlike English, there is no benchmark large scale QA dataset collected for Bengali, no pretrained language model that can be modified for Bengali question answering and no human baseline score for QA has ...
He received his PhD in Computer Science from University of Calgary, Canada. He has authored more than 150 peer-reviewed research papers. ...
doi:10.1080/24751839.2020.1833136
fatcat:ltwrsufie5hrrezjtv2tu56fjy
« Previous
Showing results 1 — 15 out of 108 results