4,391 Hits in 3.9 sec

Pronunciation-Enhanced Chinese Word Embedding

Qinjuan Yang, Haoran Xie, Gary Cheng, Fu Lee Wang, Yanghui Rao
2021 Cognitive Computation  
Specifically, we propose a pronunciation-enhanced Chinese word embedding learning method, where the pronunciations of context characters and target characters are simultaneously encoded into the embeddings  ...  Chinese characters and their sub-character components, which contain rich semantic information, are incorporated to learn Chinese word embeddings.  ...  These methods enhanced the quality of Chinese word embeddings in terms of two distinct perspectives: morphology and semantics.  ... 
doi:10.1007/s12559-021-09850-9 fatcat:gevdxrxwyjaxta34nztetru3hq

Improve word embedding using both writing and pronunciation

Wenhao Zhu, Xin Jin, Jianyue Ni, Baogang Wei, Zhiguo Lu, Zhiqiang Cai
2018 PLoS ONE  
This paper uses the Chinese language, English language and Spanish language as examples and presents several models that integrate word pronunciation characteristics into word embedding.  ...  Therefore, this paper proposes the concept of a pronunciation-enhanced word embedding model (PWE) that integrates speech information into training to fully apply the roles of both speech and writing to  ...  Based on this, this paper proposes the pronunciation-enhanced word embedding model (PWE), which incorporates speech information into the model.  ... 
doi:10.1371/journal.pone.0208785 fatcat:h6jbgyih6bg4jdpkj64khnurqq

Visual and Phonological Feature Enhanced Siamese BERT for Chinese Spelling Error Correction

Yujia Liu, Hongliang Guo, Shuai Wang, Tiejun Wang
2022 Applied Sciences  
Chinese Spelling Check (CSC) aims to detect and correct spelling errors in Chinese.  ...  However, most spelling errors in current benchmark datasets are character pairs in similar pronunciations.  ...  The Feature-enhanced Siamese BERT. Shape, pronunciation, token, segment, and position embeddings are summed up as input for FE-BERT in the left side.  ... 
doi:10.3390/app12094578 fatcat:yb376iikq5c3pdqnvottxrdmga

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT

Dongyang Dai, Zhiyong Wu, Shiyin Kang, Xixin Wu, Jia Jia, Dan Su, Dong Yu, Helen Meng
2019 Interspeech 2019  
In this paper, we propose an end-to-end framework to predict the pronunciation of polyphonic character, which accepts sentence containing polyphonic character as input in the form of Chinese character  ...  The pre-trained BERT model extracts semantic features from raw Chinese character sequence and the NN based classifier predicts the polyphonic character's pronunciation according to BERT output.  ...  And supported by experimental results, these semantic features are useful to enhance the performance of pronunciation prediction.  ... 
doi:10.21437/interspeech.2019-2292 dblp:conf/interspeech/DaiWKW0S0M19 fatcat:446dmztlzzgrzaf5pq44mkeukm

ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information [article]

Zijun Sun, Xiaoya Li, Xiaofei Sun, Yuxian Meng, Xiang Ao, Qing He, Fei Wu, Jiwei Li
2021 arXiv   pre-print
The glyph embedding is obtained based on different fonts of a Chinese character, being able to capture character semantics from the visual features, and the pinyin embedding characterizes the pronunciation  ...  of Chinese characters, which handles the highly prevalent heteronym phenomenon in Chinese (the same character has different pronunciations with different meanings).  ...  Intuitively, the rich semantics behind Chinese character glyphs should enhance the expressiveness of Chinese NLP models.  ... 
arXiv:2106.16038v1 fatcat:rmdmt3dh7fhx3gu4pj4vmnmt4i

Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis

Jingbei Li, Zhiyong Wu, Runnan Li, Pengpeng Zhi, Song Yang, Helen Meng
2019 Interspeech 2019  
However, considering the complex linguistic structure of Chinese, using Chinese characters directly for Mandarin TTS may suffer from the poor linguistic encoding performance, resulting in improper word  ...  tokenization and pronunciation errors.  ...  Here the target word vectors are embedded from a well-trained Word2Vec model trained on the Chinese Wikimedia corpus [20] .  ... 
doi:10.21437/interspeech.2019-1118 dblp:conf/interspeech/LiWLZYM19 fatcat:i6k5y6tfybbd7cuhp6u2yomjsa

MFE-NER: Multi-feature Fusion Embedding for Chinese Named Entity Recognition [article]

Jiatong Li, Kui Meng
2021 arXiv   pre-print
In Chinese NER, character substitution is a complicated linguistic phenomenon. Some Chinese characters are quite similar for sharing the same components or having similar pronunciations.  ...  In this paper, we propose a new method, Multi-Feature Fusion Embedding for Chinese Named Entity Recognition (MFE-NER), to strengthen the language pattern of Chinese and handle the character substitution  ...  The first way is word embedding, trying to separate Chinese sentences into words and get the embedding of words, which makes sense but is limited by the accuracy of word segmentation tools.  ... 
arXiv:2109.07877v1 fatcat:2zqda4sppreundfcdab34mlhzq

Phonetic-enriched Text Representation for Chinese Sentiment Analysis with Reinforcement Learning [article]

Haiyun Peng, Yukun Ma, Soujanya Poria, Yang Li, Erik Cambria
2019 arXiv   pre-print
The Chinese pronunciation system offers two characteristics that distinguish it from other languages: deep phonemic orthography and intonation variations.  ...  It functions as disambiguating intonations for each Chinese character (pinyin). Thus, a precise phonetic representation of Chinese is learned.  ...  [22] proposed decomposition of Chinese words into characters and presented a character-enhanced word embedding model (CWE).  ... 
arXiv:1901.07880v1 fatcat:6rx2sle7mfcdpgrp3ev2mkimey

Age Estimates from Name Characters

Jung-Shiuan Liou, Ching-Yen Hsiao, Lork-Yee Chow, Yen-Hao Huang, Yi-Shin Chen
2021 Applied Sciences  
However, the prediction results for ethnic-Chinese Malaysian names (in English) do not reach the same level.  ...  This is due to the linguistic differences among Chinese dialects; the features trained on Taiwanese names cannot be directly applied to English names in Malaysia.  ...  Acknowledgments: Thanks for Chih-Hao Tsai, currently the chairman of UiGathering, for providing "A List of Chinese Names" as part of Taiwanese names.  ... 
doi:10.3390/app11209611 doaj:f95eafc1c3de410096f433d07909b8ea fatcat:pq2gdvytjjcvxmr7spnivzkpme

Pre-lexical phonological processing in reading Chinese characters: An ERP study

Lin Zhou, Manson C.-M. Fong, James W. Minett, Gang Peng, William S-Y. Wang
2014 Journal of Neurolinguistics  
., Chinese characters) are usually composed of radicals which do not correspond to phonemes; instead, some radicals can occur as freestanding sinograms and have their own pronunciations.  ...  this issue by comparing the interference effects exerted by two types of primes on the targets in an eventrelated potential (ERP) experiment: RADICAL-RELATED primes, which are homophonic with a radical embedded  ...  One intuitive account is that radical pronunciations are always activated, which enhances the correct pronunciation of regular sinograms having identical pronunciations as their phonetic radicals.  ... 
doi:10.1016/j.jneuroling.2014.03.002 fatcat:rzcjkyl23rgexeaij7oxeaoovy

Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised Learning [article]

Yi Shi and Congyi Wang and Yu Chen and Bin Wang
2021 arXiv   pre-print
The majority of Chinese characters are monophonic, while a special group of characters, called polyphonic characters, have multiple pronunciations.  ...  As a prerequisite of performing speech-related generative tasks, the correct pronunciation must be identified among several candidates. This process is called Polyphone Disambiguation.  ...  Word Embedding + BLSTM Cross Ent. 0.843 1,767.104M 2.232M [12] Dai et al.  ... 
arXiv:2102.00621v2 fatcat:ahnwiycy4baizgasvvx5uuqbea

MSFM: Multi-view Semantic Feature Fusion Model for Chinese Named Entity Recognition

2022 KSII Transactions on Internet and Information Systems  
The acquired font shape, font sound, and font meaning features are fused to enhance the semantic information of Chinese characters with different granularities.  ...  Specifically, the MFEM extracts character features, word boundary features, radical features, and pinyin features of Chinese characters.  ...  of Chinese characters, and a special way to symbolize the pronunciation of Chinese characters.  ... 
doi:10.3837/tiis.2022.06.004 fatcat:pvsyhizanvfbjhagdnoachb4bq

CHARM: A Character Level Precoding Method for Chinese Text

Xiaoming Fan, Tuo Shi, Jiayan Cai, Binjun Wang
2021 IEEE Access  
Referring to the idea of word embedding enhancement in Fig. 15 , the raw Chinese text was first converted to xqma by CHARM, and the original embedding was replaced by the combination (addition or concatenation  ...  In [17] , [18] , [22] , [26] , [27] , radicals, strokes, character glyphs, components, and five-strokes codes are used to enhance word embedding respectively.  ... 
doi:10.1109/access.2021.3112190 fatcat:snizeoyuq5bvlfbdwwhpfrybhq

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge [article]

Mutian He, Jingzhou Yang, Lei He, Frank K. Soong
2022 arXiv   pre-print
Experiments show the proposed model significantly reduces pronunciation errors in low-resource, end-to-end Chinese TTS, and the lexicon-reading capability can be transferred to other languages with a smaller  ...  End-to-end TTS requires a large amount of speech/text paired data to cover all necessary knowledge, particularly how to pronounce different words in diverse contexts, so that a neural model may learn such  ...  The base model uses only speaker embeddings, so language embeddings are added to input embeddings and trained. All data are without word segmentation.  ... 
arXiv:2110.09698v2 fatcat:llxafnr62nbtbmcmn6nh7lqn4u

A Mask-Based Model for Mandarin Chinese Polyphone Disambiguation

Haiteng Zhang, Huashan Pan, Xiulin Li
2020 Interspeech 2020  
Besides, Modified Focal Loss can reduce the adverse impacts of the uneven distribution of pronunciation.  ...  Moreover, to mitigate the uneven distribution of pronunciation, we introduce a new loss called Modified Focal Loss. The experimental result shows the effectiveness of the proposed maskbased model.  ...  Their pronunciations cannot be determined simply by the word itself but require more lexical information and contextual infor-mation,such as Chinese word segmentation, POS (part of speech) tagging, syntactic  ... 
doi:10.21437/interspeech.2020-1142 dblp:conf/interspeech/ZhangPL20 fatcat:up6utlli65dd5mmvhj7swtoyja
« Previous Showing results 1 — 15 out of 4,391 results