Filters








108 Hits in 5.5 sec

Adversarial Learning for Chinese NER from Crowd Annotations [article]

YaoSheng Yang and Meishan Zhang and Wenliang Chen and Wei Zhang and Haofen Wang and Min Zhang
2018 arXiv   pre-print
In this paper, we propose an approach to performing crowd annotation learning for Chinese Named Entity Recognition (NER) to make full use of the noisy sequence labels from multiple annotators.  ...  Inspired by adversarial learning, our approach uses a common Bi-LSTM and a private Bi-LSTM for representing annotator-generic and -specific information.  ...  Conclusions In this paper, we presented an approach to performing crowd annotation learning based on the idea of adversarial training for Chinese Named Entity Recognition (NER).  ... 
arXiv:1801.05147v1 fatcat:dl3kqnrfizeyzo3qdunr2arvfe

Multifeature Named Entity Recognition in Information Security Based on Adversarial Learning

Han Zhang, Yuanbo Guo, Tao Li
2019 Security and Communication Networks  
to obtain labelled data from crowd annotations.  ...  We use the generative adversarial network to find common features in crowd annotations and then consider them in conjunction with the domain dictionary feature and sentence dependency feature as additional  ...  Yang et al. demonstrated the usability of the GAN model for NER using Chinese crowd-sourced annotations [19] .  ... 
doi:10.1155/2019/6417407 fatcat:gom7nevw3bdgheqcqvzyvjhtg4

Leveraging Part-of-Speech Tagging Features and a Novel Regularization Strategy for Chinese Medical Named Entity Recognition

Miao Jiang, Xin Zhang, Chonghao Chen, Taihua Shao, Honghui Chen
2022 Mathematics  
Chinese Medical Named Entity Recognition (Chinese-MNER) aims to identify potential entities and their categories from the unstructured Chinese medical text.  ...  What is more, the limited amount of annotated Chinese-MNER data can easily lead to the over-fitting problem while training.  ...  The cMedQANER and cEHRNER datasets are annotated from the Chinese community question answering and the Chinese electronic health records, respectively.  ... 
doi:10.3390/math10091386 doaj:511970fd72de4e3da5f1d32fcafd9cf0 fatcat:cb47j3wcmjblfjjyhdptbodeyq

CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition [article]

Yuying Zhu, Guoxin Wang, Börje F. Karlsson
2020 arXiv   pre-print
Therefore, Chinese Word Segmentation (CWS) is usually considered as the first step for Chinese NER.  ...  In this paper, we investigate a Convolutional Attention Network called CAN for Chinese NER, which consists of a character-based convolutional neural network (CNN) with local-attention layer and a gated  ...  [49] leveraged character-level BiLSTM to extract higherlevel features from crowd-annotations. V.  ... 
arXiv:1904.02141v3 fatcat:qxiwyexmpzeh7cliven2zxoeem

Multi-Source Cross-Lingual Model Transfer: Learning What to Share [article]

Xilun Chen, Ahmed Hassan Awadallah, Hany Hassan, Wei Wang, Claire Cardie
2019 arXiv   pre-print
Cross-lingual transfer learning (CLTL) is a viable method for building NLP models for a low-resource target language by leveraging labeled data from other (source) languages.  ...  Such deep neural models, however, are not applicable to most human languages due to the lack of annotated training data for various NLP tasks.  ...  Denote the annotated corpus for a source language l ∈ S as X l , where (x, y) ∼ X l is a sample drawn from X l .  ... 
arXiv:1810.03552v3 fatcat:mim5ubvnv5ghdkqk3xeyhqfhfm

Multi-Source Cross-Lingual Model Transfer: Learning What to Share

Xilun Chen, Ahmed Hassan Awadallah, Hany Hassan, Wei Wang, Claire Cardie
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
Cross-lingual transfer learning (CLTL) is a viable method for building NLP models for a low-resource target language by leveraging labeled data from other (source) languages.  ...  Such deep neural models, however, are not applicable to most human languages due to the lack of annotated training data for various NLP tasks.  ...  As it is prohibitive to obtain training data for all languages of interest, crosslingual transfer learning (CLTL) offers the possibility of learning models for a target language using annotated data from  ... 
doi:10.18653/v1/p19-1299 dblp:conf/acl/ChenAHWC19 fatcat:d2l42ilxxzbitjyfl2haontvgq

DuQM: A Chinese Dataset of Linguistically Perturbed Natural Questions for Evaluating the Robustness of Question Matching Models [article]

Hongyu Zhu, Yan Chen, Jing Yan, Jing Liu, Yu Hong, Ying Chen, Hua Wu, Haifeng Wang
2022 arXiv   pre-print
For this purpose, we create a Chinese dataset namely DuQM which contains natural questions with linguistic perturbations to evaluate the robustness of question matching models.  ...  In this paper, we focus on studying robustness evaluation of Chinese question matching.  ...  Since all annotators are linguistic experts from our internal data team instead of crowd-sourcing, we do not need to use IAA to measure the annotation quality.  ... 
arXiv:2112.08609v2 fatcat:zeupt4upp5a3ziwsrdjqx4kk2i

Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need! [article]

Steffen Eger and Johannes Daxenberger and Christian Stab and Iryna Gurevych
2018 arXiv   pre-print
the loss from cross-lingual transfer.  ...  We then compare (i) annotation projection and (ii) bilingual word embeddings based direct transfer strategies for cross-lingual AM, finding that the former performs considerably better and almost eliminates  ...  As a result, acquiring (high-quality) datasets for new languages comes at a high costbe it in terms of training and/or hiring expert annotators or querying large crowds in crowd-sourcing experiments.  ... 
arXiv:1807.08998v1 fatcat:2o67o3iklncz7dlovd7zkow25q

A survey in Adversarial Defences and Robustness in NLP [article]

Shreya Goyal, Sumanth Doddapaneni, Mitesh M.Khapra, Balaraman Ravindran
2022 arXiv   pre-print
These methods are not just used for defending neural networks from adversarial attacks, but also used as a regularization mechanism during training, saving the model from overfitting.  ...  Strong adversarial attacks are proposed by various authors for computer vision and Natural Language Processing (NLP).  ...  NER techniques fall into four major categories, (i) Rule based NER, which works on handcrafted rules (ii) Unsupervised learning approaches, which works on unsupervised learning approaches such as clustering  ... 
arXiv:2203.06414v2 fatcat:2ukd44px35e7ppskzkaprfw4ha

Cyberspace Security Using Adversarial Learning and Conformal Prediction

Harry Wechsler
2015 Intelligent Information Management  
This paper advances new directions for cyber security using adversarial learning and conformal prediction in order to enhance network and computing services defenses against adaptive, malicious, persistent  ...  The motivation for using conformal prediction and its immediate off-spring, those of semi-supervised learning and transduction, comes from them first and foremost supporting discriminative and non-parametric  ...  There is offense and there is defense with both attempting to guess and learn from each other. This is the core for adversarial learning.  ... 
doi:10.4236/iim.2015.74016 fatcat:wqiu3pkl6zeurlr3mizdahhgd4

RuBQ: A Russian Dataset for Question Answering over Wikidata [article]

Vladislav Korablinov, Pavel Braslavski
2020 arXiv   pre-print
The dataset creation started with a large collection of question-answer pairs from online quizzes.  ...  The data underwent automatic filtering, crowd-assisted entity linking, automatic generation of SPARQL queries, and their subsequent in-house verification.  ...  We are grateful to Yandex.Toloka for their data annotation grant.  ... 
arXiv:2005.10659v1 fatcat:4fyptlackrafngtbehb4mlmkhm

RuBQ: A Russian Dataset for Question Answering over Wikidata [chapter]

Vladislav Korablinov, Pavel Braslavski
2020 Lecture Notes in Computer Science  
The dataset creation started with a large collection of question-answer pairs from online quizzes.  ...  The proposed dataset generation pipeline proved to be efficient and can be employed in other data annotation projects.  ...  We are grateful to Yandex.Toloka for their data annotation grant.  ... 
doi:10.1007/978-3-030-62466-8_7 fatcat:bo2c5mp7unhhhbdxkuzfv5ujpy

A Vietnamese Dataset for Evaluating Machine Reading Comprehension [article]

Kiet Van Nguyen, Duc-Vu Nguyen, Anh Gia-Tuan Nguyen, Ngan Luu-Thuy Nguyen
2020 arXiv   pre-print
Besides, we conduct experiments on state-of-the-art MRC methods for English and Chinese as the first experimental models on UIT-ViQuAD.  ...  We also estimate human performance on the dataset and compare it to the experimental results of powerful machine learning models.  ...  ., 2018) was released with adding over 50,000 unanswerable questions created adversarially by crowd-workers according to the original ones.  ... 
arXiv:2009.14725v3 fatcat:t7de6vsxcjgf7ey6xtt5mtn4qy

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization [article]

Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, Melvin Johnson
2020 arXiv   pre-print
Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks.  ...  To this end, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders XTREME benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual  ...  Acknowledgements We'd like to thank Jon Clark for sharing with us the TyDiQA Gold Passage data and for valuable feedback.  ... 
arXiv:2003.11080v5 fatcat:uplhdbuxgrfszpl7nrn5esbpoi

Deep learning based question answering system in Bengali

Tasmiah Tahsin Mayeesha, Abdullah Md Sarwar, Rashedur M. Rahman
2020 Journal of Information and Telecommunication  
We collect a smaller human annotated QA dataset from Bengali Wikipedia with popular topics from Bangladeshi culture for evaluating our models.  ...  Unlike English, there is no benchmark large scale QA dataset collected for Bengali, no pretrained language model that can be modified for Bengali question answering and no human baseline score for QA has  ...  He received his PhD in Computer Science from University of Calgary, Canada. He has authored more than 150 peer-reviewed research papers.  ... 
doi:10.1080/24751839.2020.1833136 fatcat:ltwrsufie5hrrezjtv2tu56fjy
« Previous Showing results 1 — 15 out of 108 results