Filters








6,988 Hits in 9.5 sec

On the impact of knowledge-based linguistic annotations in the quality of scientific embeddings

Andres Garcia-Silva, Ronald Denaux, Jose Manuel Gomez-Perez
2021 Future generations computer systems  
In this paper, we conduct a comprehensive study on the use of explicit linguistic annotations to generate embeddings from a scientific corpus and quantify their impact in the resulting representations.  ...  However, until now we did not have a precise understanding of the impact that such individual annotations and their possible combinations may have in the quality of the embeddings.  ...  Acknowledgements We gratefully acknowledge the EU Horizon 2020 research and innovation programme under grant agreement No. 825627 (ELG).  ... 
doi:10.1016/j.future.2021.02.019 fatcat:5lxgpsauvvab7de5p7hndiwmfe

Scalable Construction and Reasoning of Massive Knowledge Bases

Xiang Ren, Nanyun Peng, William Yang Wang
2018 Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts  
In today's information-based society, there is abundant knowledge out there carried in the form of natural language texts (e.g., news articles, social media posts, scientific publications), which spans  ...  Traditional IE systems assume abundant human annotations for training high quality machine learning models, which is impractical when trying to deploy IE systems to a broad range of domains, settings and  ...  In the third part, we describe recent advances in knowledge base reasoning. We start with the gentle introduction to the literature, focusing on pathbased and embedding based methods.  ... 
doi:10.18653/v1/n18-6003 dblp:conf/naacl/RenPW18 fatcat:t57e7rwinjfbbgxzddcerishwi

RNN Embeddings for Identifying Difficult to Understand Medical Words

Hanna Pylieva, Artem Chernodub, Natalia Grabar, Thierry Hamon
2019 Proceedings of the 18th BioNLP Workshop and Shared Task  
We introduce novel embeddings received from RNN -FrnnMUTE (French RNN Medical Understandability Text Embeddings) which allow to reach up to 87.0 F1 score in identification of difficult words.  ...  We also note that adding pre-trained FastText word embeddings to the feature set substantially improves the performance of the model which classifies words according to their difficulty.  ...  Acknowledgments This work has been partly founded by the French ANR (grant number ANR-17-CE19-0016-01) as part of the project CLEAR (Communication, Literacy, Education, Accessibility, Readibility).  ... 
doi:10.18653/v1/w19-5011 dblp:conf/bionlp/PylievaCGH19 fatcat:h52p3mw6ong5nmbrv4x2wuju7u

Speculation detection for Chinese clinical notes: Impacts of word segmentation and embedding models

Shaodian Zhang, Tian Kang, Xingting Zhang, Dong Wen, Noémie Elhadad, Jianbo Lei
2016 Journal of Biomedical Informatics  
We experiment on a novel dataset of 36,828 clinical notes with 5103 gold-standard speculation annotations on 2000 notes, and compare the systems in which word embeddings are calculated based on word segmentations  ...  We propose a sequence labeling based system for speculation detection, which relies on features from bag of characters, bag of words, character embedding, and word embedding.  ...  This study was supported by the National Natural Science Foundation of China (NSFC) Grant # 81171426 and #81471756, and National Institute of General Medical Sciences Grant R01GM114355.  ... 
doi:10.1016/j.jbi.2016.02.011 pmid:26923634 pmcid:PMC5282586 fatcat:us6u2btb2zeqbo4d3ge7fybwwa

Aspect Based Sentiment Analysis into the Wild

Caroline Brun, Vassilina Nikoulina
2018 Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis  
In this paper, we test state-of-the-art Aspect Based Sentiment Analysis (ABSA) systems trained on a widely used dataset on actual data.  ...  We then analyse the results in performance of different versions of the same system on both datasets. We also propose light adaptation methods to increase system robustness.  ...  The annotations have been performed by a single annotator, expert linguist with a very good knowledge of the SemEval2016 annotation guidelines, using BRAT, (Stenetorp et al., 2012) .  ... 
doi:10.18653/v1/w18-6217 dblp:conf/wassa/BrunN18 fatcat:k6mgo7fqongdvb6g544ex7hzli

Scientific document summarization via citation contextualization and scientific discourse

Arman Cohan, Nazli Goharian
2017 International Journal on Digital Libraries  
We evaluate our proposed method on two scientific summarization datasets in the biomedical and computational linguistics domains.  ...  We propose three approaches for contextualizing citations which are based on query reformulation, word embeddings, and supervised learning.  ...  Our proposed approaches are based on query reformulations, word embeddings, and domain-specific knowledge.  ... 
doi:10.1007/s00799-017-0216-8 fatcat:4zwdaqixnzei3i6yegahz3gxge

An overview of NexusLinguarum use cases: Current status and challenges

Sara Carvalho, Ilan Kernerman
2021 Zenodo  
In addition, it describes the cooperation with the other WGs of NexusLinguarum. URL: https://lexicala.com/review/2021/an-overview-of-nexuslinguarum-use-cases-current-status-and-challenges/  ...  Working Group 4 (WG4) of the NexusLinguarum COST Action – European network for Web-centred linguistic data science (CA18209) – is dedicated to applying and validating the Action's methodologies and technologies  ...  Acknowledgement The use case is based on the project "Bilingual automatic terminology extraction" funded by the Research Council of Lithuania (LMTLT, agreement No. P-MIP-20-282).  ... 
doi:10.5281/zenodo.5729078 fatcat:t3z3chvtmjbofdyk72wxzofmku

Pre-Reordering for Neural Machine Translation: Helpful or Harmful?

Jinhua Du, Andy Way
2017 Prague Bulletin of Mathematical Linguistics  
In this paper, we firstly investigate the impact of pre-reordered source-side data on NMT, and then propose to incorporate features for the pre-reordering model in SMT as input factors into NMT (factored  ...  The features, namely parts-of-speech (POS), word class and reordered index, are encoded as feature vectors and concatenated to the word embeddings to provide extra knowledge for NMT.  ...  Acknowledgements We would like to thank the reviewers for their valuable and constructive comments. Thanks Dr. Jian Zhang for his initial idea and work on pre-reordered SMT.  ... 
doi:10.1515/pralin-2017-0018 fatcat:ahgbt7pzunbbpj2lhmyqltr4fe

Leveraging Unannotated Texts for Scientific Relation Extraction

Qin DAI, Naoya INOUE, Paul REISERT, Kentaro INUI
2018 IEICE transactions on information and systems  
Our experiments on the RANIS corpus [1] prove the effectiveness of the proposed model on relation extraction from scientific articles. key words: relation extraction, scientific document, word embedding  ...  Qin DAI †a) , Naoya INOUE †, † †b) , Paul REISERT † †c) , Nonmembers, and Kentaro INUI †, † †d) , Member SUMMARY A tremendous amount of knowledge is present in the evergrowing scientific literature.  ...  In Table 8 , we compare the impact of us- ing different SRW c on the performance of scientific RE.  ... 
doi:10.1587/transinf.2018edp7180 fatcat:btw3qvfxcbd6bbys3jma6lfhmq

Scientia Potentia Est – On the Role of Knowledge in Computational Argumentation [article]

Anne Lauscher, Henning Wachsmuth, Iryna Gurevych, Goran Glavaš
2021 arXiv   pre-print
Despite extensive research efforts in the recent years, computational modeling of argumentation remains one of the most challenging areas of natural language processing (NLP).  ...  In this survey paper, we fill this gap by (1) proposing a pyramid of types of knowledge required in CA tasks, (2) analysing the state of the art with respect to the reliance and exploitation of these types  ...  Our selection of prominent papers for the in-depth analysis was guided by the following set of (sometimes mutually conflicting) criteria: (1) maximize the scientific impact of the publications in the sample  ... 
arXiv:2107.00281v2 fatcat:kbargvdhdjaf5eqdx2sxzgut4a

ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks [article]

Michihiro Yasunaga, Jungo Kasai, Rui Zhang, Alexander R. Fabbri, Irene Li, Dan Friedman, Dragomir R. Radev
2019 arXiv   pre-print
Scientific article summarization is challenging: large, annotated corpora are not available, and the summary should ideally include the article's impacts on research community.  ...  We 1) develop and release the first large-scale manually-annotated corpus for scientific papers (on computational linguistics) by enabling faster annotation, and 2) propose summarization methods that integrate  ...  We also thank everyone who helped the evaluation in this work.  ... 
arXiv:1909.01716v3 fatcat:lyjiifl4ljadlmrju7mwfying4

Measuring prominence of scientific work in online news as a proxy for impact [article]

James Ravenscroft and Amanda Clare and Maria Liakata
2020 arXiv   pre-print
The impact made by a scientific paper on the work of other academics has many established metrics, including metrics based on citation counts and social media commenting.  ...  This supports our hypothesis that linguistic prominence in news can be used to suggest the wider non-academic impact of scientific work.  ...  To our knowledge there is no previous work that investigates linguistic prominence within scientific papers.  ... 
arXiv:2007.14454v1 fatcat:5lgmpkx3kvddvojuh5bgyihiny

Low-Resource Adaptation of Neural NLP Models [article]

Farhad Nooralahzadeh
2020 arXiv   pre-print
It is challenging to find annotated data of sufficient amount and quality.  ...  These resources are often based on language data available in large quantities, such as English newswire.  ...  In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.  ... 
arXiv:2011.04372v1 fatcat:626mbe5ba5bkdflv755o35u5pq

Automatic Argumentative-Zoning Using Word2vec [article]

Haixia Liu
2017 arXiv   pre-print
The proposed approach builds sentence representations using learned embeddings based on neural network.  ...  In comparison with document summarization on the articles from social media and newswire, argumentative zoning (AZ) is an important task in scientific paper analysis.  ...  To evaluate the impact from different domains, the first model was trained on different corpus.The characteristics of word embeddings based on different model and dataset are listed inTable.2.  ... 
arXiv:1703.10152v1 fatcat:5jen4wwxi5em7ipbhaz3ixyx5q

ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks

Michihiro Yasunaga, Jungo Kasai, Rui Zhang, Alexander R. Fabbri, Irene Li, Dan Friedman, Dragomir R. Radev
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Scientific article summarization is challenging: large, annotated corpora are not available, and the summary should ideally include the article's impacts on research community.  ...  We 1) develop and release the first large-scale manually-annotated corpus for scientific papers (on computational linguistics) by enabling faster annotation, and 2) propose summarization methods that integrate  ...  We also thank everyone who helped the evaluation in this work.  ... 
doi:10.1609/aaai.v33i01.33017386 fatcat:jfnidu6emfgzpmj3nfeuri6dfa
« Previous Showing results 1 — 15 out of 6,988 results