5,585 Hits in 4.8 sec

ICE-TEA: In-Context Expansion and Translation of English Abbreviations [chapter]

Waleed Ammar, Kareem Darwish, Ali El Kahki, Khaled Hafez
2011 Lecture Notes in Computer Science  
The wide use of abbreviations in modern texts poses interesting challenges and opportunities in the field of NLP.  ...  This paper addresses two related problems: (1) expansion of abbreviations given a context, and (2) translation of sentences with abbreviations.  ...  Then, Chinese abbreviation-expansion pairs were extracted from monolingual Chinese text, and matched with their English NE translations using the Chinese automatic translation obtained before as a bridge  ... 
doi:10.1007/978-3-642-19437-5_4 fatcat:ccye4ekmx5gdtgwo4rwzoqtmt4

Selected Topics from LVCSR Research for Asian Languages at Tokyo Tech

Sadaoki FURUI
2012 IEICE transactions on information and systems  
We have proposed a new method for automatically generating Chinese abbreviations, and by expanding the vocabulary using the generated abbreviations, we have significantly improved the performance of spoken  ...  For Thai, since there is no word boundary in the written form, we have proposed a new method for automatically creating word-like units from a text corpus, and applied topic and speaking style adaptation  ...  Vocabulary Expansion through Automatic Abbreviation Generation for Chinese Spoken Query-Based Information Retrieval Chinese Abbreviations In Chinese spoken query-based IR, official names of organizations  ... 
doi:10.1587/transinf.e95.d.1182 fatcat:xrbyx236qjdtrh6pqqhlf6py64

Experiments with ad hoc ambiguous abbreviation expansion

Agnieszka Mykowiecka, Malgorzata Marciniak
2019 Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)  
The first one automatically selects all words in text which might be an expansion of an abbreviation according to the language rules.  ...  The paper addresses experiments to expand ad hoc ambiguous abbreviations in medical notes on the basis of morphologically annotated texts, without using additional domain resources.  ...  Acknowledgments This work was supported by the Polish National Science Centre project 2014/15/B/ST6/05186 and by EU structural funds as part of the Smart Growth Operational Programme POIR.01.01.01-00-0328  ... 
doi:10.18653/v1/d19-6207 dblp:conf/acl-louhi/MykowieckaM19 fatcat:2qacjjj645awnbznc4q4jiii2q

Simple Yet Effective Method for Entity Linking in Microblog-Genre Text [chapter]

Qingliang Miao, Huayu Lu, Shu Zhang, Yao Meng
2013 Communications in Computer and Information Science  
In particular, we first use a mention expansion model to identify all possible entities in the knowledge base for a mention based on a variety of sources.  ...  Unlike news text, microblogs pose several new challenges, due to their short, noisy, contextualized and real-time nature.  ...  DBpedia Spotlight [10] is a system for automatically annotating text documents with DBpedia URIs.  ... 
doi:10.1007/978-3-642-41644-6_44 fatcat:3xf6git22rgvtf77qj3j7b6ovi

Improving Translation of Queries with Infrequent Unknown Abbreviations and Proper Names

Wen-Hsiang Lu, Jiun-Hung Lin, Yao-Sheng Chang
2008 International Journal of Computational Linguistics and Chinese Language Processing  
Therefore, in this paper we present a new search-result-based abbreviation translation method and a new two-stage hybrid translation extraction method to solve the problem of extracting translations of  ...  learning algorithm for dealing with online English-Chinese name transliteration.  ...  To automatically collect huge amounts of parallel corpora from the Web in various domains, some researchers have developed feasible techniques of utilizing similar file names, text length, and link structures  ... 
dblp:journals/ijclclp/LuLC08 fatcat:6fkwbqlr5jedhidkms2tc3bnva

Mining atomic Chinese abbreviations with a probabilistic single character recovery model

Jing-Shin Chang, Wei-Lun Teng
2007 Language Resources and Evaluation  
An HMM-based Single Character Recovery (SCR) Model is proposed in this paper to extract a large set of " atomic abbreviation pairs"from a large text corpus.  ...  By an " atomic abbreviation pair,"it refers to an abbreviated word and its root word (i.e., unabbreviated form) in which the abbreviation is a single Chinese character.  ...  Acknowledgements The current work was supported by the National Science Council (NSC), Taiwan, Republic of China (ROC), under the contract NSC 93-2213-E-260-015.  ... 
doi:10.1007/s10579-007-9026-8 fatcat:3y6olf3hfrdqlaar4fxxms7dt4

Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora

Zhifei Li, David Yarowsky
2008 Annual Meeting of the Association for Computational Linguistics  
Chinese abbreviations are widely used in modern Chinese texts.  ...  Due to the richness of Chinese abbreviations, many of them may not appear in available parallel corpora, in which case current machine translation systems simply treat them as unknown words and leave them  ...  According to Chang and Lai (2004) , approximately 20% of sentences in a typical news article have abbreviated words in them.  ... 
dblp:conf/acl/LiY08 fatcat:toqeoz5e4bfkpezpu2thstegua

Normalization of non-standard words

Richard Sproat, Alan W. Black, Stanley Chen, Shankar Kumar, Mari Ostendorf, Christopher Richards
2001 Computer Speech and Language  
For abbreviation expansion in particular, we investigated both supervised and unsupervised approaches.  ...  We developed a taxonomy of NSWs on the basis of four rather distinct text types-news text, a recipes newsgroup, a hardware-product-specific newsgroup, and real-estate classified ads.  ...  (On the other hand, there seem to be an almost total lack of abbreviations, in the technical sense used here, in Chinese; see, e.g. Sproat, 2000.)  ... 
doi:10.1006/csla.2001.0169 fatcat:6iezmkjervcyvmp5j5b5ii6jgm

CMU in Cross-Language Information Retrieval at NTCIR-3

Yiming Yang, Nianli Ma
2002 NTCIR Conference on Evaluation of Information Access Technologies  
We participated in the Cross-Language Information Retrieval evaluation at NTCIR-3 for the English-Chinese and English-Japanese tasks.  ...  The MT-based approach was most effective among these alternatives in our experiments for English-Chinese retrieval on the NTCIR-2 and 3 data.  ...  in a new language pair and new domain?  ... 
dblp:conf/ntcir/YangM02 fatcat:3pgl4cszxndwpkehyme65kqr6i

BUPTTeam Participation at TAC 2016 Knowledge Base Population

Yongmei Tan, Xiaoguang Li, Di Zheng
2016 Text Analysis Conference  
The Entity Discovery and Linking (EDL) track at NIST TAC-KBP2016 aims to extract named entity mentions from a source collection of textual documents in multiple languages (English, Chinese and Spanish)  ...  The system consists of six components: 1) preprocessing; 2) mention recognition; 3) mention expansion; 4) candidates generation; 5) candidates ranking; 6) clustering.  ...  Introduction The goal of EDL track at Text Analysis Conference (TAC) 2016 is to automatically discover entity mentions from three languages (English, Chinese and Spanish) raw texts and link them to an  ... 
dblp:conf/tac/TanLZ16 fatcat:57ujx4wp2vhm7jbe7q66ybxms4

BUPTTeam Participation at TAC 2015 Knowledge Base Population

Yongmei Tan, Di Zheng, Maolin Li, Xiaojie Wang
2015 Text Analysis Conference  
In this year TEDL is a new Trilingual entity discovery and linking task, and then there are more challenges.  ...  In this paper, we proposes a novel method to recognize name mentions in raw texts and link them to a knowledge base (KB) entries based on the following four steps: 1) preprocessing, 2) Named Entity recognition  ...  Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the Ministry of Education of the People's Republic  ... 
dblp:conf/tac/TanZLW15 fatcat:npgjo5nlinbj5ehpsqntbqpx7e

Baseline Methods for Automatic Disambiguation of Abbreviations in Jewish Law Documents [chapter]

Yaakov HaCohen-Kerner, Ariel Kass, Ariel Peretz
2004 Lecture Notes in Computer Science  
That is, abbreviations should be expanded correctly. Disambiguation of abbreviations is critical for correct understanding not only for the abbreviations themselves but also for the whole text.  ...  Currently, experimental results show that abbreviations are expanded correctly in a rate of almost 60%.  ...  abbreviations, (3) Using learning techniques in order to find the best weighted combinations of methods and (4) Elaborating the model for abbreviation disambiguation for various kinds of Hebrew documents  ... 
doi:10.1007/978-3-540-30228-5_6 fatcat:qphxp4ob75dcxkiecljprcreg4

SCUT-COUCH Textline_NU: An Unconstrained Online Handwritten Chinese Text Lines Dataset

Hanyu Yan, Lianwen Jin, Christian Viard-Gaudin, Harold Mouchere
2010 2010 12th International Conference on Frontiers in Handwriting Recognition  
An unconstrained online handwritten Chinese text lines dataset, SCUT-COUCH2009-TL, a subset of SCUT-COUCH [1], is built to facilitate the research of unconstrained online Chinese text recognition.  ...  The current vision of SCUT-COUCH2009-TL has 8,809 text lines (4,813 lines are collected by touch screen LCD and 3,996 by digital pen) and 159,866 characters in total that are written by more than 157 participants  ...  This work is supported in part by the research funding of NSFC (no. U0735004, 60772216) and GDNSF (no. 07118074) and from Atlanstic/University of Nantes.  ... 
doi:10.1109/icfhr.2010.123 dblp:conf/icfhr/YanJVM10 fatcat:q3sb2bmcgjbupjajjuiejhmug4

Robust Extended Tokenization Framework for Romanian by Semantic Parallel Texts Processing

Marius Zubac, Vasile Dadarlat
2013 International Journal on Natural Language Computing  
Then we prove that for the disambiguation purposes the bilingual text provided by high profile on-line machine translation services performs almost to the same level with human-originated parallel texts  ...  We also claim that semantic disambiguation performs much better in a bilingual context than in a monolingual one.  ...  In the new types of text presented above there are new sources of ambiguities, the most important being generated by lack of diacritics in Romanian texts.  ... 
doi:10.5121/ijnlc.2013.2602 fatcat:7ugdu2wlrrgh3jnr3plse5exja

Efficient Deep Learning Approach for Dimensionality Reduction using Micro blogs from Big data

Mr. M. Vengateshwaran
2017 International Journal for Research in Applied Science and Engineering Technology  
In this project, we apply deep learning networks to map the high-dimensional representations of micro blog texts to low-dimensional representations.  ...  An important preprocessing step of micro blog text mining is to convert natural language texts into proper numerical representations.  ...  However, compared with the long text, because of the characteristics that short text described weak signals, noise characteristics of the data and automatic classification system of Chinese texts based  ... 
doi:10.22214/ijraset.2017.3002 fatcat:phaqfjnacvajheketlhh5h5bve
« Previous Showing results 1 — 15 out of 5,585 results