A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is application/pdf
.
Filters
ICE-TEA: In-Context Expansion and Translation of English Abbreviations
[chapter]
2011
Lecture Notes in Computer Science
The wide use of abbreviations in modern texts poses interesting challenges and opportunities in the field of NLP. ...
This paper addresses two related problems: (1) expansion of abbreviations given a context, and (2) translation of sentences with abbreviations. ...
Then, Chinese abbreviation-expansion pairs were extracted from monolingual Chinese text, and matched with their English NE translations using the Chinese automatic translation obtained before as a bridge ...
doi:10.1007/978-3-642-19437-5_4
fatcat:ccye4ekmx5gdtgwo4rwzoqtmt4
Selected Topics from LVCSR Research for Asian Languages at Tokyo Tech
2012
IEICE transactions on information and systems
We have proposed a new method for automatically generating Chinese abbreviations, and by expanding the vocabulary using the generated abbreviations, we have significantly improved the performance of spoken ...
For Thai, since there is no word boundary in the written form, we have proposed a new method for automatically creating word-like units from a text corpus, and applied topic and speaking style adaptation ...
Vocabulary Expansion through Automatic Abbreviation Generation for Chinese Spoken Query-Based Information Retrieval
Chinese Abbreviations In Chinese spoken query-based IR, official names of organizations ...
doi:10.1587/transinf.e95.d.1182
fatcat:xrbyx236qjdtrh6pqqhlf6py64
Experiments with ad hoc ambiguous abbreviation expansion
2019
Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)
The first one automatically selects all words in text which might be an expansion of an abbreviation according to the language rules. ...
The paper addresses experiments to expand ad hoc ambiguous abbreviations in medical notes on the basis of morphologically annotated texts, without using additional domain resources. ...
Acknowledgments This work was supported by the Polish National Science Centre project 2014/15/B/ST6/05186 and by EU structural funds as part of the Smart Growth Operational Programme POIR.01.01.01-00-0328 ...
doi:10.18653/v1/d19-6207
dblp:conf/acl-louhi/MykowieckaM19
fatcat:2qacjjj645awnbznc4q4jiii2q
Simple Yet Effective Method for Entity Linking in Microblog-Genre Text
[chapter]
2013
Communications in Computer and Information Science
In particular, we first use a mention expansion model to identify all possible entities in the knowledge base for a mention based on a variety of sources. ...
Unlike news text, microblogs pose several new challenges, due to their short, noisy, contextualized and real-time nature. ...
DBpedia Spotlight [10] is a system for automatically annotating text documents with DBpedia URIs. ...
doi:10.1007/978-3-642-41644-6_44
fatcat:3xf6git22rgvtf77qj3j7b6ovi
Improving Translation of Queries with Infrequent Unknown Abbreviations and Proper Names
2008
International Journal of Computational Linguistics and Chinese Language Processing
Therefore, in this paper we present a new search-result-based abbreviation translation method and a new two-stage hybrid translation extraction method to solve the problem of extracting translations of ...
learning algorithm for dealing with online English-Chinese name transliteration. ...
To automatically collect huge amounts of parallel corpora from the Web in various domains, some researchers have developed feasible techniques of utilizing similar file names, text length, and link structures ...
dblp:journals/ijclclp/LuLC08
fatcat:6fkwbqlr5jedhidkms2tc3bnva
Mining atomic Chinese abbreviations with a probabilistic single character recovery model
2007
Language Resources and Evaluation
An HMM-based Single Character Recovery (SCR) Model is proposed in this paper to extract a large set of " atomic abbreviation pairs"from a large text corpus. ...
By an " atomic abbreviation pair,"it refers to an abbreviated word and its root word (i.e., unabbreviated form) in which the abbreviation is a single Chinese character. ...
Acknowledgements The current work was supported by the National Science Council (NSC), Taiwan, Republic of China (ROC), under the contract NSC 93-2213-E-260-015. ...
doi:10.1007/s10579-007-9026-8
fatcat:3y6olf3hfrdqlaar4fxxms7dt4
Unsupervised Translation Induction for Chinese Abbreviations using Monolingual Corpora
2008
Annual Meeting of the Association for Computational Linguistics
Chinese abbreviations are widely used in modern Chinese texts. ...
Due to the richness of Chinese abbreviations, many of them may not appear in available parallel corpora, in which case current machine translation systems simply treat them as unknown words and leave them ...
According to Chang and Lai (2004) , approximately 20% of sentences in a typical news article have abbreviated words in them. ...
dblp:conf/acl/LiY08
fatcat:toqeoz5e4bfkpezpu2thstegua
Normalization of non-standard words
2001
Computer Speech and Language
For abbreviation expansion in particular, we investigated both supervised and unsupervised approaches. ...
We developed a taxonomy of NSWs on the basis of four rather distinct text types-news text, a recipes newsgroup, a hardware-product-specific newsgroup, and real-estate classified ads. ...
(On the other hand, there seem to be an almost total lack of abbreviations, in the technical sense used here, in Chinese; see, e.g. Sproat, 2000.) ...
doi:10.1006/csla.2001.0169
fatcat:6iezmkjervcyvmp5j5b5ii6jgm
CMU in Cross-Language Information Retrieval at NTCIR-3
2002
NTCIR Conference on Evaluation of Information Access Technologies
We participated in the Cross-Language Information Retrieval evaluation at NTCIR-3 for the English-Chinese and English-Japanese tasks. ...
The MT-based approach was most effective among these alternatives in our experiments for English-Chinese retrieval on the NTCIR-2 and 3 data. ...
in a new language pair and new domain? ...
dblp:conf/ntcir/YangM02
fatcat:3pgl4cszxndwpkehyme65kqr6i
BUPTTeam Participation at TAC 2016 Knowledge Base Population
2016
Text Analysis Conference
The Entity Discovery and Linking (EDL) track at NIST TAC-KBP2016 aims to extract named entity mentions from a source collection of textual documents in multiple languages (English, Chinese and Spanish) ...
The system consists of six components: 1) preprocessing; 2) mention recognition; 3) mention expansion; 4) candidates generation; 5) candidates ranking; 6) clustering. ...
Introduction The goal of EDL track at Text Analysis Conference (TAC) 2016 is to automatically discover entity mentions from three languages (English, Chinese and Spanish) raw texts and link them to an ...
dblp:conf/tac/TanLZ16
fatcat:57ujx4wp2vhm7jbe7q66ybxms4
BUPTTeam Participation at TAC 2015 Knowledge Base Population
2015
Text Analysis Conference
In this year TEDL is a new Trilingual entity discovery and linking task, and then there are more challenges. ...
In this paper, we proposes a novel method to recognize name mentions in raw texts and link them to a knowledge base (KB) entries based on the following four steps: 1) preprocessing, 2) Named Entity recognition ...
Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the Ministry of Education of the People's Republic ...
dblp:conf/tac/TanZLW15
fatcat:npgjo5nlinbj5ehpsqntbqpx7e
Baseline Methods for Automatic Disambiguation of Abbreviations in Jewish Law Documents
[chapter]
2004
Lecture Notes in Computer Science
That is, abbreviations should be expanded correctly. Disambiguation of abbreviations is critical for correct understanding not only for the abbreviations themselves but also for the whole text. ...
Currently, experimental results show that abbreviations are expanded correctly in a rate of almost 60%. ...
abbreviations, (3) Using learning techniques in order to find the best weighted combinations of methods and (4) Elaborating the model for abbreviation disambiguation for various kinds of Hebrew documents ...
doi:10.1007/978-3-540-30228-5_6
fatcat:qphxp4ob75dcxkiecljprcreg4
SCUT-COUCH Textline_NU: An Unconstrained Online Handwritten Chinese Text Lines Dataset
2010
2010 12th International Conference on Frontiers in Handwriting Recognition
An unconstrained online handwritten Chinese text lines dataset, SCUT-COUCH2009-TL, a subset of SCUT-COUCH [1], is built to facilitate the research of unconstrained online Chinese text recognition. ...
The current vision of SCUT-COUCH2009-TL has 8,809 text lines (4,813 lines are collected by touch screen LCD and 3,996 by digital pen) and 159,866 characters in total that are written by more than 157 participants ...
This work is supported in part by the research funding of NSFC (no. U0735004, 60772216) and GDNSF (no. 07118074) and from Atlanstic/University of Nantes. ...
doi:10.1109/icfhr.2010.123
dblp:conf/icfhr/YanJVM10
fatcat:q3sb2bmcgjbupjajjuiejhmug4
Robust Extended Tokenization Framework for Romanian by Semantic Parallel Texts Processing
2013
International Journal on Natural Language Computing
Then we prove that for the disambiguation purposes the bilingual text provided by high profile on-line machine translation services performs almost to the same level with human-originated parallel texts ...
We also claim that semantic disambiguation performs much better in a bilingual context than in a monolingual one. ...
In the new types of text presented above there are new sources of ambiguities, the most important being generated by lack of diacritics in Romanian texts. ...
doi:10.5121/ijnlc.2013.2602
fatcat:7ugdu2wlrrgh3jnr3plse5exja
Efficient Deep Learning Approach for Dimensionality Reduction using Micro blogs from Big data
2017
International Journal for Research in Applied Science and Engineering Technology
In this project, we apply deep learning networks to map the high-dimensional representations of micro blog texts to low-dimensional representations. ...
An important preprocessing step of micro blog text mining is to convert natural language texts into proper numerical representations. ...
However, compared with the long text, because of the characteristics that short text described weak signals, noise characteristics of the data and automatic classification system of Chinese texts based ...
doi:10.22214/ijraset.2017.3002
fatcat:phaqfjnacvajheketlhh5h5bve
« Previous
Showing results 1 — 15 out of 5,585 results