69 Hits in 6.9 sec

ResNet CNN with LSTM Based Tamil Text Detection from Video Frames

I. Muthumani, N. Malmurugan, L. Ganesan
2022 Intelligent Automation and Soft Computing  
The model consists of a text detector, script identifier, and text recognizer. The identification in video frames of textual regions is performed using deep neural networks as object detectors.  ...  The combination of ResNet CNNs and bidirectional LSTMs has high recognition rates for detecting video texts in Tamil cursive script.  ...  Acknowledgement: We thank LetPub ( for its linguistic assistance during the preparation of this manuscript.  ... 
doi:10.32604/iasc.2022.018030 fatcat:e4r2entusndlpgkjiq7ahbno6y

Detection and recognition of cursive text from video frames

Ali Mirza, Ossama Zeshan, Muhammad Atif, Imran Siddiqi
2020 EURASIP Journal on Image and Video Processing  
This paper presents a comprehensive framework for detection and recognition of textual content in video frames. More specifically, we target cursive scripts taking Urdu text as a case study.  ...  Detection of textual regions in video frames is carried out by fine-tuning deep neural networks based object detectors for the specific case of text detection.  ...  Acknowledgements Authors would like to thank IGNITE for funding this project.  ... 
doi:10.1186/s13640-020-00523-5 fatcat:gavtdy4pyvfhxgatrjxiqipr4y

Camera-Based Bi-lingual Script Identification at Word Level using SFTA Features

2019 International journal of recent technology and engineering  
The multilingual documents are captured from video/camera for identification of script of the text document for automatic reading and editing.  ...  In this paper, an attempt is made to address the problem of script identification from camera captured document images using SFTA features.  ...  They have developed the triplets and extracted the local convolution features with the combination of Bag-of-Visual-Words (BoVW).  ... 
doi:10.35940/ijrte.b2713.078219 fatcat:7zqekrlpabelzlrxiknmjsqdzq

Using Machine Learning Techniques, Textual and Visual Processing in Scalable Concept Image Annotation Challenge

Alexandru Cristea, Adrian Iftene
2016 Conference and Labs of the Evaluation Forum  
For the second subtask, we created a resource that contains triplets (concept1, verb, concept2), where concepts are from the list of concepts provided by the organizers and verb is a relation between concepts  ...  For Subtask 3, we transform the input file in a form used by Subtask 2 and then we used the component developed by our team for Subtask 2.  ...  Special thanks go to all colleagues from the Faculty of Computer Science, second year, group B2, who were involved in this project.  ... 
dblp:conf/clef/CristeaI16 fatcat:vzqjdtm2u5cfpojxhbeesvabcy

Text Detection and Recognition in Imagery: A Survey

Qixiang Ye, David Doermann
2015 IEEE Transactions on Pattern Analysis and Machine Intelligence  
This paper analyzes, compares, and contrasts technical challenges, methods, and the performance of text detection and recognition research in color imagery.  ...  This review provides a fundamental comparison and analysis of the remaining problems in the field.  ...  They would also like to thank Tao Wang of Stanford, Kai Wang of UCSD, Chongzhao Shi of Chinese Academy of Sciences, and Chew Lim Tan of the National University of Singapore for providing images.  ... 
doi:10.1109/tpami.2014.2366765 pmid:26352454 fatcat:cuz3qhkglnahdebxqptbsgpjmm

Deep sparse auto-encoder features learning for Arabic text recognition

Najoua Rahal, Maroua Tounsi, Amir Hussain, Adel M. Alimi
2021 IEEE Access  
We propose a novel hybrid network, combining a Bag-of-Feature (BoF) framework for feature extraction based on a deep Sparse Auto-Encoder (SAE), and Hidden Markov Models (HMMs), for sequence recognition  ...  INDEX TERMS Arabic text recognition, feature learning, bag of features, sparse auto-encoder, hidden Markov models.  ...  such as speech recognition [60] , script identification [61] , and text recognition.  ... 
doi:10.1109/access.2021.3053618 fatcat:p7jhbokjsjbunceuq4lu7xnmci

Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey

Khushboo Khurana, Umesh Deshpande
2021 IEEE Access  
In this article, we have discussed the emerging research directions and various application areas of video-QA.  ...  This paper presents a brief survey of the video captioning techniques and a comprehensive review of existing techniques, datasets, and evaluation metrics for the task of video-QA.  ...  These methods involve the identification and localization of various objects, actions, events, and scenes.  ... 
doi:10.1109/access.2021.3058248 fatcat:bnjmbffxgreb5jkjuxethaqnde

Extraction and Analysis of Fictional Character Networks

Vincent Labatut, Xavier Bost
2019 ACM Computing Surveys  
We first describe the extraction process in a generic way, and explain how its constituting steps are implemented in practice, depending on the medium of the narrative, the goal of the network analysis  ...  A character network is a graph extracted from a narrative, in which vertices represent characters and edges correspond to interactions between them.  ...  Acknowledgments The authors would like to thank the anonymous reviewers for their work and feedback, which helped significantly improve this article. Part of this work was funded by Agorantic FR 3621.  ... 
doi:10.1145/3344548 fatcat:zujg55eixfct7blvj6lxwo4usq

A Comprehensive Review of the Video-to-Text Problem [article]

Jesus Perez-Martin and Benjamin Bustos and Silvio Jamil F. Guimarães and Ivan Sipiran and Jorge Pérez and Grethel Coello Said
2021 arXiv   pre-print
This review categorizes and describes the state-of-the-art techniques for the video-to-text problem. It covers the main video-to-text methods and the ways to evaluate their performance.  ...  The spatiotemporal information present in videos introduces diversity and complexity regarding the visual content and the structure of associated language descriptions.  ...  However, more recent work reaffirms the importance of the explicit visual content identification and local features for video captioning.  ... 
arXiv:2103.14785v3 fatcat:xwzziozwjbghfobtowu5bny6bu

Continuous Human Action Recognition for Human-Machine Interaction: A Review [article]

Harshala Gammulle, David Ahmedt-Aristizabal, Simon Denman, Lachlan Tychsen-Smith, Lars Petersson, Clinton Fookes
2022 arXiv   pre-print
With advances in data-driven machine learning research, a wide variety of prediction models have been proposed to capture spatio-temporal features for the analysis of video streams.  ...  By reviewing a large body of recent related work in the literature, we thoroughly analyse, explain and compare action segmentation methods and provide details on the feature extraction and learning strategies  ...  For the purpose of this review, we categorise these improvements as either bag of freebies (BoF) or bag of specials (BoS).  ... 
arXiv:2202.13096v1 fatcat:mczyeb5vyfgxdiubjhklwjrtlm

Video Description: A Survey of Methods, Datasets and Evaluation Metrics [article]

Nayyer Aafaq, Ajmal Mian, Wei Liu, Syed Zulqarnain Gilani, Mubarak Shah
2019 arXiv   pre-print
Numerous methods, datasets and evaluation metrics have been proposed in the literature, calling the need for a comprehensive survey to focus research efforts in this flourishing new direction.  ...  The past few years have seen a surge of research in this area due to the unprecedented success of deep learning in computer vision and natural language processing.  ...  ACKNOWLEDGEMENTS The authors acknowledge Marcus Rohrbach (Facebook AI Research) for his valuable input. The research was supported by ARC Discovery Grant DP160101458 and DP150102405.  ... 
arXiv:1806.00186v3 fatcat:elxztcpzizhr7clugnbjvvrpte

From image to language and back again

2018 Natural Language Engineering  
In this article, we touch upon all of these topics as we review work involving images and text under the three main headings of image description (Section 2), visually grounded referring expression generation  ...  Work in computer vision and natural language processing involving images and text has been experiencing explosive growth over the past decade, with a particular boost coming from the neural network revolution  ...  The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the  ... 
doi:10.1017/s1351324918000086 fatcat:fvxkgjlolra4vns2r5qx4xvg3i

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods [article]

Aditya Mogadala and Marimuthu Kalimuthu and Dietrich Klakow
2020 arXiv   pre-print
The largest of the growths in these fields has been made possible with deep learning, a sub-area of machine learning, which uses the principles of artificial neural networks.  ...  This has created significant interest in the integration of vision and language. The tasks are designed such that they perfectly embrace the ideas of deep learning.  ...  We extend our special thanks to Matthew Kuhn and Stephanie Lund for painstakingly proofing the whole manuscript.  ... 
arXiv:1907.09358v2 fatcat:4fyf6kscy5dfbewll3zs7yzsuq

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

Aditya Mogadala, Marimuthu Kalimuthu, Dietrich Klakow
2021 The Journal of Artificial Intelligence Research  
This has created significant interest in the integration of vision and language.  ...  Much of the growth in these fields has been made possible with deep learning, a sub-area of machine learning that uses artificial neural networks.  ...  We extend our special thanks to Matthew Kuhn and Stephanie Lund for painstakingly proofing the whole manuscript.  ... 
doi:10.1613/jair.1.11688 fatcat:kvfdrg3bwrh35fns4z67adqp6i

Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis [article]

Alexander Schindler
2020 arXiv   pre-print
In all of these experiments the audio-based results serve as benchmark for the visual and audio-visual approaches.  ...  The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.  ...  A set of scripts is provided as well on the website for this. In total, the feature files amount to approximately 40 gigabyte of uncompressed text files.  ... 
arXiv:2002.00251v1 fatcat:6cz6rivc3fbg7fahdsnokxfrk4
« Previous Showing results 1 — 15 out of 69 results