A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Pronunciation-Enhanced Chinese Word Embedding
2021
Cognitive Computation
Specifically, we propose a pronunciation-enhanced Chinese word embedding learning method, where the pronunciations of context characters and target characters are simultaneously encoded into the embeddings ...
Chinese characters and their sub-character components, which contain rich semantic information, are incorporated to learn Chinese word embeddings. ...
These methods enhanced the quality of Chinese word embeddings in terms of two distinct perspectives: morphology and semantics. ...
doi:10.1007/s12559-021-09850-9
fatcat:gevdxrxwyjaxta34nztetru3hq
Improve word embedding using both writing and pronunciation
2018
PLoS ONE
This paper uses the Chinese language, English language and Spanish language as examples and presents several models that integrate word pronunciation characteristics into word embedding. ...
Therefore, this paper proposes the concept of a pronunciation-enhanced word embedding model (PWE) that integrates speech information into training to fully apply the roles of both speech and writing to ...
Based on this, this paper proposes the pronunciation-enhanced word embedding model (PWE), which incorporates speech information into the model. ...
doi:10.1371/journal.pone.0208785
fatcat:h6jbgyih6bg4jdpkj64khnurqq
Visual and Phonological Feature Enhanced Siamese BERT for Chinese Spelling Error Correction
2022
Applied Sciences
Chinese Spelling Check (CSC) aims to detect and correct spelling errors in Chinese. ...
However, most spelling errors in current benchmark datasets are character pairs in similar pronunciations. ...
The Feature-enhanced Siamese BERT. Shape, pronunciation, token, segment, and position embeddings are summed up as input for FE-BERT in the left side. ...
doi:10.3390/app12094578
fatcat:yb376iikq5c3pdqnvottxrdmga
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-Trained BERT
2019
Interspeech 2019
In this paper, we propose an end-to-end framework to predict the pronunciation of polyphonic character, which accepts sentence containing polyphonic character as input in the form of Chinese character ...
The pre-trained BERT model extracts semantic features from raw Chinese character sequence and the NN based classifier predicts the polyphonic character's pronunciation according to BERT output. ...
And supported by experimental results, these semantic features are useful to enhance the performance of pronunciation prediction. ...
doi:10.21437/interspeech.2019-2292
dblp:conf/interspeech/DaiWKW0S0M19
fatcat:446dmztlzzgrzaf5pq44mkeukm
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information
[article]
2021
arXiv
pre-print
The glyph embedding is obtained based on different fonts of a Chinese character, being able to capture character semantics from the visual features, and the pinyin embedding characterizes the pronunciation ...
of Chinese characters, which handles the highly prevalent heteronym phenomenon in Chinese (the same character has different pronunciations with different meanings). ...
Intuitively, the rich semantics behind Chinese character glyphs should enhance the expressiveness of Chinese NLP models. ...
arXiv:2106.16038v1
fatcat:rmdmt3dh7fhx3gu4pj4vmnmt4i
Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis
2019
Interspeech 2019
However, considering the complex linguistic structure of Chinese, using Chinese characters directly for Mandarin TTS may suffer from the poor linguistic encoding performance, resulting in improper word ...
tokenization and pronunciation errors. ...
Here the target word vectors are embedded from a well-trained Word2Vec model trained on the Chinese Wikimedia corpus [20] . ...
doi:10.21437/interspeech.2019-1118
dblp:conf/interspeech/LiWLZYM19
fatcat:i6k5y6tfybbd7cuhp6u2yomjsa
MFE-NER: Multi-feature Fusion Embedding for Chinese Named Entity Recognition
[article]
2021
arXiv
pre-print
In Chinese NER, character substitution is a complicated linguistic phenomenon. Some Chinese characters are quite similar for sharing the same components or having similar pronunciations. ...
In this paper, we propose a new method, Multi-Feature Fusion Embedding for Chinese Named Entity Recognition (MFE-NER), to strengthen the language pattern of Chinese and handle the character substitution ...
The first way is word embedding, trying to separate Chinese sentences into words and get the embedding of words, which makes sense but is limited by the accuracy of word segmentation tools. ...
arXiv:2109.07877v1
fatcat:2zqda4sppreundfcdab34mlhzq
Phonetic-enriched Text Representation for Chinese Sentiment Analysis with Reinforcement Learning
[article]
2019
arXiv
pre-print
The Chinese pronunciation system offers two characteristics that distinguish it from other languages: deep phonemic orthography and intonation variations. ...
It functions as disambiguating intonations for each Chinese character (pinyin). Thus, a precise phonetic representation of Chinese is learned. ...
[22] proposed decomposition of Chinese words into characters and presented a character-enhanced word embedding model (CWE). ...
arXiv:1901.07880v1
fatcat:6rx2sle7mfcdpgrp3ev2mkimey
Age Estimates from Name Characters
2021
Applied Sciences
However, the prediction results for ethnic-Chinese Malaysian names (in English) do not reach the same level. ...
This is due to the linguistic differences among Chinese dialects; the features trained on Taiwanese names cannot be directly applied to English names in Malaysia. ...
Acknowledgments: Thanks for Chih-Hao Tsai, currently the chairman of UiGathering, for providing "A List of Chinese Names" as part of Taiwanese names. ...
doi:10.3390/app11209611
doaj:f95eafc1c3de410096f433d07909b8ea
fatcat:pq2gdvytjjcvxmr7spnivzkpme
Pre-lexical phonological processing in reading Chinese characters: An ERP study
2014
Journal of Neurolinguistics
., Chinese characters) are usually composed of radicals which do not correspond to phonemes; instead, some radicals can occur as freestanding sinograms and have their own pronunciations. ...
this issue by comparing the interference effects exerted by two types of primes on the targets in an eventrelated potential (ERP) experiment: RADICAL-RELATED primes, which are homophonic with a radical embedded ...
One intuitive account is that radical pronunciations are always activated, which enhances the correct pronunciation of regular sinograms having identical pronunciations as their phonetic radicals. ...
doi:10.1016/j.jneuroling.2014.03.002
fatcat:rzcjkyl23rgexeaij7oxeaoovy
Polyphone Disambiguition in Mandarin Chinese with Semi-Supervised Learning
[article]
2021
arXiv
pre-print
The majority of Chinese characters are monophonic, while a special group of characters, called polyphonic characters, have multiple pronunciations. ...
As a prerequisite of performing speech-related generative tasks, the correct pronunciation must be identified among several candidates. This process is called Polyphone Disambiguation. ...
Word Embedding + BLSTM
Cross Ent.
0.843
1,767.104M 2.232M
[12] Dai et al. ...
arXiv:2102.00621v2
fatcat:ahnwiycy4baizgasvvx5uuqbea
MSFM: Multi-view Semantic Feature Fusion Model for Chinese Named Entity Recognition
2022
KSII Transactions on Internet and Information Systems
The acquired font shape, font sound, and font meaning features are fused to enhance the semantic information of Chinese characters with different granularities. ...
Specifically, the MFEM extracts character features, word boundary features, radical features, and pinyin features of Chinese characters. ...
of Chinese characters, and a special way to symbolize the pronunciation of Chinese characters. ...
doi:10.3837/tiis.2022.06.004
fatcat:pvsyhizanvfbjhagdnoachb4bq
CHARM: A Character Level Precoding Method for Chinese Text
2021
IEEE Access
Referring to the idea of word embedding enhancement in Fig. 15 , the raw Chinese text was first converted to xqma by CHARM, and the original embedding was replaced by the combination (addition or concatenation ...
In [17] , [18] , [22] , [26] , [27] , radicals, strokes, character glyphs, components, and five-strokes codes are used to enhance word embedding respectively. ...
doi:10.1109/access.2021.3112190
fatcat:snizeoyuq5bvlfbdwwhpfrybhq
Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge
[article]
2022
arXiv
pre-print
Experiments show the proposed model significantly reduces pronunciation errors in low-resource, end-to-end Chinese TTS, and the lexicon-reading capability can be transferred to other languages with a smaller ...
End-to-end TTS requires a large amount of speech/text paired data to cover all necessary knowledge, particularly how to pronounce different words in diverse contexts, so that a neural model may learn such ...
The base model uses only speaker embeddings, so language embeddings are added to input embeddings and trained. All data are without word segmentation. ...
arXiv:2110.09698v2
fatcat:llxafnr62nbtbmcmn6nh7lqn4u
A Mask-Based Model for Mandarin Chinese Polyphone Disambiguation
2020
Interspeech 2020
Besides, Modified Focal Loss can reduce the adverse impacts of the uneven distribution of pronunciation. ...
Moreover, to mitigate the uneven distribution of pronunciation, we introduce a new loss called Modified Focal Loss. The experimental result shows the effectiveness of the proposed maskbased model. ...
Their pronunciations cannot be determined simply by the word itself but require more lexical information and contextual infor-mation,such as Chinese word segmentation, POS (part of speech) tagging, syntactic ...
doi:10.21437/interspeech.2020-1142
dblp:conf/interspeech/ZhangPL20
fatcat:up6utlli65dd5mmvhj7swtoyja
« Previous
Showing results 1 — 15 out of 4,391 results