98 Hits in 5.3 sec

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution [article]

Jiapeng Wang, Chongyu Liu, Lianwen Jin, Guozhi Tang, Jiaxin Zhang, Shuaitao Zhang, Qianying Wang, Yaqiang Wu, Mingxiang Cai
2021 arXiv   pre-print
Specifically, the information extraction branch collects abundant visual and semantic representations from text spotting for multimodal feature fusion and conversely, provides higher-level semantic clues  ...  EPHOIE consists of 1,494 images of examination paper head with complex layouts and background, including a total of 15,771 Chinese handwritten or printed text instances.  ...  After acquiring the features of multi levels from multi sources as representations in a learned common embedding space, our adaptive feature fusion module (AFFM) introduces two multi-layer multi-head self-attention  ... 
arXiv:2102.06732v1 fatcat:sdda3rbjrfey3bajxvw7klkovi

Cursive Character Recognition in Natural Scene Images using a Multilevel Convolutional Neural Network Fusion

Asghar Ali Chandio, Md. Asikuzzaman, Mark R. Pickering
2020 IEEE Access  
In this paper, we propose a multiscale feature aggregation (MSFA) and a multi-level feature fusion (MLFF) network architecture to recognize isolated Urdu characters in natural images.  ...  INDEX TERMS Cursive text recognition, natural scene Urdu character recognition, multi-scale feature aggregation, multi-level feature fusion, convolutional neural network (CNN) VOLUME 7, 2019  ...  ACKNOWLEDGEMENT The first author is thankful to the University of New South Wales, Australia for supporting his Ph.D. candidature with a scholarship.  ... 
doi:10.1109/access.2020.3001605 fatcat:s2sbgrsoafdl5gpyqdvlziwiw4

KNN and ANN-based Recognition of Handwritten Pashto Letters using Zoning Features

Sulaiman Khan, Hazrat Ali, Zahid Ullah, Nasru Minallah, Shahid Maqsood, Abdul Hafeez
2018 International Journal of Advanced Computer Science and Applications  
This paper presents an intelligent recognition system for handwritten Pashto letters. However, handwritten character recognition is challenging due to the variations in shape and style.  ...  In this work, we have designed a database of moderate size, which encompasses a total of 4488 images, stemming from 102 distinguishing samples for each of the 44 letters in Pashto.  ...  This multi-modal fusion scheme combines the data of both offline and online data, which indeed a real scenario of data being fed to the network.  ... 
doi:10.14569/ijacsa.2018.091069 fatcat:aan4uvbw2zgxflap3264syws54

Large-Scale Printed Chinese Character Recognition for ID Cards Using Deep Learning and Few Samples Transfer Learning

Yi-Quan Li, Hao-Sen Chang, Daw-Tung Lin
2022 Applied Sciences  
In this study, we developed an automatic OCR system designed to identify up to 13,070 large-scale printed Chinese characters by using deep learning neural networks and fine-tuning techniques.  ...  To expand the diversity of the synthesized training dataset, rotation and zooming data augmentation are applied.  ...  [30] blended a general multimodal deep learning model with five fusion architectures.  ... 
doi:10.3390/app12020907 fatcat:xzn6warh4raffkfhxeppn7n3im

PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition [article]

Dezhi Peng, Lianwen Jin, Yuliang Liu, Canjie Luo, Songxuan Lai
2022 arXiv   pre-print
Handwritten Chinese text recognition (HCTR) has been an active research topic for decades.  ...  PageNet detects and recognizes characters and predicts the reading order between them, which is more robust and flexible when dealing with complex layouts including multi-directional and curved text lines  ...  Xiu et al. (2019) explored the attention-based decoder and proposed a multi-level multimodal fusion network to incorporate both the visual and linguistic semantic information.  ... 
arXiv:2207.14807v1 fatcat:gxi5x2dz5ndc3aabq3mh5v36om

Document AI: Benchmarks, Models and Applications [article]

Lei Cui, Yiheng Xu, Tengchao Lv, Furu Wei
2021 arXiv   pre-print
Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents.  ...  Sarkhel & Nandi (2019) extract features at different levels by introducing a pyramidal multi-scale structure.  ...  On this basis, Bukhari et al. (2009) apply it to script-independent handwritten documents. In addition, there are some hybrid models.  ... 
arXiv:2111.08609v1 fatcat:7mg67htkgbgyjg63hlegd32m24

Derin Öğrenme Araştırma Alanlarının Literatür Taraması

M. Mutlu Yapıcı, Adem Tekerek, Nurettin Topaloğlu
2019 Gazi Mühendislik Bilimleri Dergisi  
Çalışmada Özerk Araçlar (Autonomous Vehicles), Doğal Dil İşleme (Natural Language Processing), El Yazısı Karakter Tanıma (Handwritten Character Recognition), İmza Doğrulama (Signature Verification), Ses  ...  This study investigated DL studies which are made in the most popular and challenging fields such as autonomous vehicles, natural language processing, handwritten character recognition, signature verification  ...  [117] for recognizing unconstrained offline handwritten texts. For the handwritten Arabic character recognition task, Al Jawfi [5] designed a LeNet based network that consists of two stages.  ... 
doi:10.30855/gmbd.2019.03.01 fatcat:2sv7dg7elrfqppcjx5otzmb7pi

A Review of Web Infodemic Analysis and Detection Trends across Multi-modalities using Deep Neural Networks [article]

Chahat Raj, Priyanka Meel
2021 arXiv   pre-print
This review primarily deals with multi-modal fake news detection techniques that include images, videos, and their combinations with text.  ...  These detection techniques apply popular machine learning and deep learning algorithms. Previous work in this domain covers fake news detection vastly among text circulating online.  ...  Their proposed network consists of three components:a CNN-based frequency domain sub-network, a pixel domain sub-network built with CNN-RNN to extract semantic features, and a fusion subnetwork.  ... 
arXiv:2112.00803v1 fatcat:twppg5v37bdozcdloaa6zfk7s4

TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents [article]

Zhanzhan Cheng, Peng Zhang, Can Li, Qiao Liang, Yunlu Xu, Pengfei Li, Shiliang Pu, Yi Niu, Fei Wu
2022 arXiv   pre-print
This paper proposes a unified end-to-end information extraction framework from visually rich documents, where text reading and information extraction can reinforce each other via a well-designed multi-modal  ...  Specifically, the text reading part provides multi-modal features like visual, textual and layout features.  ...  EPHOIE [37] is a Chinese ex- amination paper head dataset, in which each image is cropped from the full examination paper. This dataset contains handwritten information, and is also fully annotated.  ... 
arXiv:2207.06744v1 fatcat:lo3yowbpxzaqrhjuw6qntd6ggy

Bibliometric Analysis of the Application of Convolutional Neural Network in Computer Vision

Huie Chen, Zhenjie Deng
2020 IEEE Access  
Literature samples of CNNs are analyzed by a basic statistic and co-citation network.  ...  This paper analyzes the research progress in field of Convolutional Neural Networks(CNNs) using the bibliometric method.  ...  It is useful in solving image fusion problems such as multi-focus image fusion and multi-mode image fusion [10] .  ... 
doi:10.1109/access.2020.3019336 fatcat:ysjn7al4gjharpr6o3gwmnlxdy

Signature-Based Biometric Authentication [chapter]

Srikanta Pal, Umapada Pal, Michael Blumenstein
2014 Studies in Computational Intelligence  
Biometrics evaluate a person's unique physical or behavioural traits to authenticate their identity.  ...  Therefore a biometric is the measurement and statistical analysis of unchanging biological characteristics.  ...  traits performed at score level fusion.  ... 
doi:10.1007/978-3-319-05885-6_13 fatcat:g5qrfp47cng3tlgeaa62ddcuqm

Front Matter

2020 2020 International Joint Conference on Neural Networks (IJCNN)  
Ting, Yan Fang, Ashwin Sanjay Lele and Arijit Raychowdhury Georgia Institute of Technology, United States P1319 An Efficient Spiking Neural Network for Recognizing Gestures with a DVS Camera on the Loihi  ...  Cluster-level Approach [#21208] Bethehem S.  ... 
doi:10.1109/ijcnn48605.2020.9207579 fatcat:hptkppolhbfn7nz3yangesetpi

Learning Neural Textual Representations for Citation Recommendation

Binh Thanh Kieu, Inigo Jauregi Unanue, Son Bao Pham, Hieu Xuan Phan, Massimo Piccardi
2021 2020 25th International Conference on Pattern Recognition (ICPR)  
to Offline Handwritten Chinese and Japanese Text Line Recognition DAY 1 -Jan 12, 2021 Liebl, Bernhard; Burghardt, Manuel 1412 An Evaluation of DNN Architectures for Page Segmentation of Historical  ...  Attention Network Based Point-View Fusion for 3D Shape Recognition DAY 3 -Jan 14, 2021 Yang, Bang; Zou, Yuexian 50 Visual Oriented Encoder: Integrating Multimodal and Multi-Scale Contexts for  ... 
doi:10.1109/icpr48806.2021.9412725 fatcat:3vge2tpd2zf7jcv5btcixnaikm

Review of research on speech technology

Rubén San-Segundo, Carlos D. Martínez-Hinarejos, Alfonso Ortega
2011 Journal of Speech Sciences  
In the last two decades, there has been an important increase in research on speech technology in Spain, mainly due to a higher level of funding from European, Spanish and local institutions and also due  ...  This paper also introduces the Spanish Network of Speech Technologies (RTTH.  ...  Acknowledgements The authors want to thank all the contributions from their colleagues working in all the research groups included in the Spanish Network of Speech Technology.  ... 
doi:10.20396/joss.v1i1.15010 fatcat:2okbqwdslzbn7gg3ypfjkksmoi

Table of contents

2021 ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
ASSISTANTS Alkesh Patel, Akanksha Bindal, Hadas Kotek, Christopher Klein, Jason Williams, Apple, United States IVMSP-27.5: AN ADAPTIVE MULTI-SCALE AND MULTI-LEVEL FEATURES FUSION .....................  ...  Lian, Institute of Automation, Chinese Academy of Sciences, China MMSP-6.4: MULTI-TARGET DOA ESTIMATION WITH AN AUDIO-VISUAL FUSION .................................... 4280 MECHANISM Xinyuan Qian, Maulik  ... 
doi:10.1109/icassp39728.2021.9414617 fatcat:m5ugnnuk7nacbd6jr6gv2lsfby
« Previous Showing results 1 — 15 out of 98 results