Filters








483 Hits in 2.2 sec

Spelling Correction with Denoising Transformer [article]

Alex Kuznetsov, Hector Urdiales
2021 arXiv   pre-print
This procedure is used to train the production spelling correction model based on a transformer architecture. This model is currently served in the HubSpot product search.  ...  We present a novel method of performing spelling correction on short input strings, such as search queries or individual words.  ...  Conclusion We presented a novel method for spelling correction -a denoising autoencoder transformer based on a noise generation procedure which generates artificial spelling mistakes in a realistic manner  ... 
arXiv:2105.05977v1 fatcat:utgqze55lncojepfrfykofbnse

Combining a Context Aware Neural Network with a Denoising Autoencoder for Measuring String Similarities [article]

Mehdi Ben Lazreg, Morten Goodwin
2018 arXiv   pre-print
The experimental results show that the resulting metrics succeeds in 85.4\% of the cases in finding the correct version of a non-standard spelling among the closest words, compared to 63.2\% with the established  ...  Non-Standard and standard spellings of the same words, and (2) the context of the words.  ...  The next examples in Table 2 presents a non-standard spelling for which the approach with denoising autoencoder fails to recognize the correct version in the five closest word: The correct version of  ... 
arXiv:1807.06414v1 fatcat:2tcv2b56qvgkjb5dwa2pt5f6ka

Contextual Text Denoising with Masked Language Models [article]

Yifu Sun, Haoming Jiang
2019 arXiv   pre-print
We propose a new contextual text denoising algorithm based on the ready-to-use masked language model.  ...  Recently, with the help of deep learning models, significant advances have been made in different Natural Language Processing (NLP) tasks.  ...  9 https://github.com/pytorch/fairseq/tree/master/exam such as CoNLL-2014, to further fine-tune the denoising model in a supervised way to improve the performance.  ... 
arXiv:1910.14080v1 fatcat:z6cqbph3jjdgzivxumiowwchsi

An Improved Text Extraction Approach with Auto Encoder for Creating Your Own Audiobook

2022 International Journal of Information Retrieval Research  
Our result analysis demonstrates that with denoising and spell checking, our model has achieved an accuracy of 98.11% when compared to 84.02% without any denoising or spell check mechanism.  ...  As an initial step, deep learning techniques are constructed to denoise the images that are fed to the system. This is followed by text extraction with the help of OCR engines.  ...  From the bar plot, we can infer that the post processing method i.e. denoised with spell speck gives significant accuracy of about 98.6% compared to 95% with only denoising and no spellcheck.  ... 
doi:10.4018/ijirr.289570 fatcat:zjmtlsoxzveuxfn5cw2dzv6wka

Context-aware Stand-alone Neural Spelling Correction [article]

Xiangci Li, Hairong Liu, Liang Huang
2020 arXiv   pre-print
Inspired by this, we address the stand-alone spelling correction problem, which only corrects the spelling of each token without additional token insertion or deletion, by utilizing both spelling information  ...  On the contrary, humans can easily infer the corresponding correct words from their misspellings and surrounding context.  ...  Having a surprisingly robust language processing system to denoise the scrambled spellings, humans can relatively easily solve spelling correction (Rawlinson, 1976) .  ... 
arXiv:2011.06642v1 fatcat:bp5qmobl65fjhns6p5zzohlncm

Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data [article]

Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, Jingming Liu
2019 arXiv   pre-print
We pre-train the copy-augmented architecture with a denoising auto-encoder using the unlabeled One Billion Benchmark and make comparisons between the fully pre-trained model and a partially pre-trained  ...  Neural machine translation systems have become state-of-the-art approaches for Grammatical Error Correction (GEC) task.  ...  We build a statistical-based spell error correction system and correct the spell errors in our training data.  ... 
arXiv:1903.00138v3 fatcat:z3tyvcqg5ndjbehxuuq5aa2hhy

Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data

Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, Jingming Liu
2019 Proceedings of the 2019 Conference of the North  
We pre-train the copy-augmented architecture with a denoising auto-encoder using the unlabeled One Billion Benchmark and make comparisons between the fully pre-trained model and a partially pretrained  ...  Neural machine translation systems have become state-of-the-art approaches for Grammatical Error Correction (GEC) task.  ...  We build a statistical-based spell error correction system and correct the spell errors in our training data.  ... 
doi:10.18653/v1/n19-1014 dblp:conf/naacl/ZhaoWSJL19 fatcat:gnstwmpncfemhbhi4gqpynd2ni

Stacked DeBERT: All Attention in Incomplete Data for Text Classification [article]

Gwenaelle Cunha Sergio, Minho Lee
2020 arXiv   pre-print
In this paper, we propose Stacked DeBERT, short for Stacked Denoising Bidirectional Encoder Representations from Transformers.  ...  These intermediate features are given as input to novel denoising transformers which are responsible for obtaining richer input representations.  ...  Our approach consists of obtaining richer input representations from input tokens by stacking denoising transformers on an embedding layer with vanilla transformers.  ... 
arXiv:2001.00137v1 fatcat:7malbdga7jd5bj6pzej6naos5y

Improving Robustness of Neural Machine Translation with Multi-task Learning

Shuyan Zhou, Xiangkai Zeng, Yingqi Zhou, Antonios Anastasopoulos, Graham Neubig
2019 Proceedings of the Fourth Conference on Machine Translation (Volume 2: Shared Task Papers, Day 1)  
Our model achieves a BLEU score of 32.8 on the shared task French to English dataset, which is 7.1 BLEU points higher than the baseline vanilla transformer trained with clean text 1 .  ...  In this work, we propose a multitask learning algorithm for transformer-based MT systems that is more resilient to this noise.  ...  Denoising text: Sakaguchi et al. (2017) proposes semi-character level recurrent neural network (scRNN) to correct words with scrambling characters.  ... 
doi:10.18653/v1/w19-5368 dblp:conf/wmt/ZhouZZAN19 fatcat:7uwf4rvgzrcatcx2qczpqs5eq4

Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking [article]

Heng-Da Xu, Zhongli Li, Qingyu Zhou, Chao Li, Zizhen Wang, Yunbo Cao, Heyan Huang, Xian-Ling Mao
2021 arXiv   pre-print
Chinese Spell Checking (CSC) aims to detect and correct erroneous characters for user-generated text in the Chinese language.  ...  However, these methods use either heuristics or handcrafted confusion sets to predict the correct character.  ...  ., 2019) with the Transformer library (Wolf et al., 2020) .  ... 
arXiv:2105.12306v1 fatcat:hzjipz5y4va5dpx7tqve7urzu4

Pre-Training-Based Grammatical Error Correction Model for the Written Language of Chinese Hearing Impaired Students

Binbin Chen, Jingyu Zhang
2022 IEEE Access  
Via the re-ranking strategy, our model can correct various kinds of errors including spelling and complex syntax errors.  ...  The comparison experiments with baseline models show that our model obtains superior performance either in the hearing impaired students' grammatical error correction or in a common grammatical error correction  ...  model + Spelling correction S1 denotes the N-gram language model for correcting the spelling errors in section 3.1.  ... 
doi:10.1109/access.2022.3159676 fatcat:fhez37kyovbmlnr3rjnmxkayhe

Denoising of Disturbed Signal using Reconstruction Technique of EMD for Railway Bearing Condition Monitoring

Agus Susanto, Budi Artono, Surajet Khonjun, Rizal Mahmud
2020 Zenodo  
This study presents an effective denoising noisy signal for bearing condition monitoring.  ...  The Hilbert- Huang spectrum (HHT) spectrum of reconstruction signal was generated by applying Hilbert transform.  ...  HHT with the denoising signal using reconstruction technique works well efficiently than HHT without denoising signal using reconstruction technique for bearing condition monitoring Fig. 1 1 IMF component  ... 
doi:10.5281/zenodo.4418876 fatcat:2k4rz5cnezg7vkxfkwr7i2ztsy

Denoising of Disturbed Signal using Reconstruction Technique of EMD for Railway Bearing Condition Monitoring

Agus Susanto, Budi Artono, Surajet Khonjun, Rizal Mahmud
2020 Zenodo  
This study presents an effective denoising noisy signal for bearing condition monitoring.  ...  The Hilbert- Huang spectrum (HHT) spectrum of reconstruction signal was generated by applying Hilbert transform.  ...  HHT with the denoising signal using reconstruction technique works well efficiently than HHT without denoising signal using reconstruction technique for bearing condition monitoring Fig. 1 1 IMF component  ... 
doi:10.5281/zenodo.4418872 fatcat:ufzbzzsqlzcxvhqz4no6qrdckq

Denoising of Disturbed Signal using Reconstruction Technique of EMD for Railway Bearing Condition Monitoring

Agus Susanto, Budi Artono, Surajet Khonjun, Rizal Mahmud
2020 Zenodo  
This study presents an effective denoising noisy signal for bearing condition monitoring.  ...  The Hilbert- Huang spectrum (HHT) spectrum of reconstruction signal was generated by applying Hilbert transform.  ...  HHT with the denoising signal using reconstruction technique works well efficiently than HHT without denoising signal using reconstruction technique for bearing condition monitoring Fig. 1 1 IMF component  ... 
doi:10.5281/zenodo.4418875 fatcat:fsbm6mapc5ewzaqhkcx7ctbkoe

Denoising based Sequence-to-Sequence Pre-training for Text Generation

Liang Wang, Wei Zhao, Ruoyu Jia, Sujian Li, Jingming Liu
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
We conduct experiments on two text generation tasks: abstractive summarization, and grammatical error correction.  ...  Meanwhile, we design a hybrid model of Transformer and pointer-generator networks as the backbone architecture for PoDA.  ...  Simple spelling errors are corrected based on edit distance. The dataset statistics are shown in Table 5 . and GLEU score from 56.52 to 59.02(+2.50) for JFLEG.  ... 
doi:10.18653/v1/d19-1412 dblp:conf/emnlp/WangZJLL19 fatcat:b6w57svbkjhrziz5it6wo472ua
« Previous Showing results 1 — 15 out of 483 results