14,975 Hits in 5.1 sec

Aggregating Local Context for Accurate Scene Text Detection [chapter]

Dafang He, Xiao Yang, Wenyi Huang, Zihan Zhou, Daniel Kifer, C. Lee Giles
2017 Lecture Notes in Computer Science  
First, for identifying text regions we design a novel Convolutional Neural Network (CNN) architecture that aggregates local surrounding information for cascaded, fast and accurate detection.  ...  Here we propose a novel end-to-end scene text detection algorithm.  ...  A text localization algorithm is proposed which efficiently aggregates local context information in detecting candidate text regions.  ... 
doi:10.1007/978-3-319-54193-8_18 fatcat:aemkcqnybvaqnmhgjqqwiqvfca

Cluttered TextSpotter: An End-to-End Trainable Light- weight Scene Text Spotter for Cluttered Environment

Randheer Bagi, Tanima Dutta, Hari Prabhat Gupta
2020 IEEE Access  
It is an end-to-end trainable deep neural network that uses local part information, global structural features, and context cue information of oriented region proposals for spotting text instances.  ...  Scene text spotting aims at simultaneously localizing and recognizing text instances, symbols, and logos in natural scene images.  ...  SCENE TEXT SPOTTING Jaderberg et al. enables feature sharing for detecting text instances.  ... 
doi:10.1109/access.2020.3002808 fatcat:x4kbcajahrc5vgtuxsc6oyyjsa

Single Shot Text Detector with Regional Attention [article]

Pan He, Weilin Huang, Tong He, Qile Zhu, Yu Qiao, Xiaolin Li
2017 arXiv   pre-print
This enhances local details, and also encodes strong context information, allow- ing the detector to work reliably on multi-scale and multi- orientation text with single-scale images.  ...  It departs from recent FCN- based text detectors which cascade multiple FCN models to achieve an accurate prediction.  ...  The multi-layer aggregations further enhance local detailed information and encode rich context information, resulting in stronger deep features for word prediction.  ... 
arXiv:1709.00138v1 fatcat:u5wlilrd7vdjdpeaejvlx3x6py

Don't only Feel Read: Using Scene text to understand advertisements [article]

Arka Ujjal Dey, Suman K. Ghosh, Ernest Valveny
2019 arXiv   pre-print
We propose a framework for automated classification of Advertisement Images, using not just Visual features but also Textual cues extracted from embedded text.  ...  However for unstructured text like tags, or other Meta data, which is not sequential, the global feature is usually an aggregation (e.g. mean or averaging) of local semantic features.  ...  We aggregate the k corresponding word2vec vectors to generate a global text feature for a given image.  ... 
arXiv:1806.08279v3 fatcat:wdbzg7fd3jda7btbartwh6f2am

Graph Fusion Network for Multi-Oriented Object Detection [article]

Shi-Xue Zhang, Xiaobin Zhu, Jie-Bo Hou, Xu-Cheng Yin
2022 arXiv   pre-print
Our GFNet is extensible and adaptively fuse dense detection boxes to detect more accurate and holistic multi-oriented object instances.  ...  Specifically, we first adopt a locality-aware clustering algorithm to group dense detection boxes into different clusters.  ...  For text detection, we adopt the popular scene text detection method EAST [74] as our detector.  ... 
arXiv:2205.03562v1 fatcat:cis6ro4w5reonotosux5ahkol4

PSENet-based efficient scene text detection

Guanglong Liao, Zhongjie Zhu, Yongqiang Bai, Tingna Liu, Zhibo Xie
2021 EURASIP Journal on Advances in Signal Processing  
In this paper, an efficient scene text detection scheme is proposed based on the Progressive Scale Expansion Network (PSENet).  ...  AbstractText detection is a key technique and plays an important role in computer vision applications, but efficient and precise text detection is still challenging.  ...  Specifically, they want to acknowledge the editor and anonymous reviewers for their valuable comments. Authors' contributions  ... 
doi:10.1186/s13634-021-00808-5 fatcat:ber7ru5qn5cplm25w4smbrkdiq

Image-Text Alignment using Adaptive Cross-attention with Transformer Encoder for Scene Graphs

Juyong Song, Sunghyun Choi
2021 British Machine Vision Conference  
Second, to enhance the usage of scene-graphs that can capture the high-level relation of local features, we introduce transformer encoders for textual scene graphs to align with visual scene graphs.  ...  Global-local and local-local information integration between two modalities are essential for an effective alignment.  ...  In summary, we improve globallocal and local-local information integration via context-awareness. Context-awareness is also applied to the loss functions through NT-XEnt.  ... 
dblp:conf/bmvc/Song021 fatcat:mdjyps6wpfevzfm5b5vi3vco6q

Specific Category Region Proposal Network for Text Detection in Natural Scene

Yuanhong Zhong, Cheng Xinyu, zhou zhaokun, zhang shun, zhang jing, huang guan
2020 IET Image Processing  
Finally, for the top-ranking region proposals, SCRPN built an end-to-end pipeline for scene text detection directly.  ...  Then, the multiple features of oversegmented regions and text saliency map are used for region aggregation.  ...  For scene text detection, we need to get higher-quality proposals, such as IoU > 0.7.  ... 
doi:10.1049/iet-ipr.2019.0652 fatcat:liwtcqqwp5agrgfy3rstzdj4ee

Convolutional Regression Network for Multi-oriented Text Detection

Junyu Gao, Qi Wang, Yuan Yuan
2019 IEEE Access  
The whole framework can be trained in an end-to-end mechanism which is suitable for detecting multi-oriented texts.  ...  The extensive experiments are conducted on three mainstream scene-text datasets, and the experimental results evidence the proposed CRN achieves competitive performance.  ...  Usually, accurate text localization/ detection [15] - [18] is a prerequisite for effectively understanding text.  ... 
doi:10.1109/access.2019.2929819 fatcat:tlsjaarga5c6dawwsncyfrksfy

Rotationally Equivariant 3D Object Detection [article]

Hong-Xing Yu, Jiajun Wu, Li Yi
2022 arXiv   pre-print
context information.  ...  Specifically, we consider the object detection problem in 3D scenes, where an object bounding box should be equivariant regarding the object pose, independent of the scene motion.  ...  We thank Frédo Durand for helpful discussions.  ... 
arXiv:2204.13630v2 fatcat:jptmsghp4zhftpcsjq7lnurfcq

An end-to-end text spotter with text relation networks

Jianguo Jiang, Baole Wei, Min Yu, Gang Li, Boquan Li, Chao Liu, Min Li, Weiqing Huang
2021 Cybersecurity  
Then, a convolution operation is performed on the graph to aggregate semantic information and enhance the intermediate features corresponding to text instances.  ...  In this paper, we propose a novel graph-based method for intermediate semantic features enhancement, called Text Relation Networks.  ...  for more accurate text spotting.  ... 
doi:10.1186/s42400-021-00073-x fatcat:b5qp5c5iynbufid625xt7cb7fe

Bidirectional Regression for Arbitrary-Shaped Text Detection [article]

Tao Sheng, Zhouhui Lian
2021 arXiv   pre-print
., 83.4% F-score for Total-Text, 82.4% F-score for MSRA-TD500, etc.  ...  We evaluate our method on several challenging scene text benchmarks, including both curved and multi-oriented text datasets.  ...  The pixels around text boundaries are chosen as references of the predicted offsets for accurate localization.  ... 
arXiv:2107.06129v1 fatcat:otg5ktymczfgrcc5pclzr2nymq

Accurate Scene Text Detection via Scale-Aware Data Augmentation and Shape Similarity Constraint

Pengwen Dai, Yang Li, Hua Zhang, Jingzhi Li, Xiaochun Cao
2021 IEEE transactions on multimedia  
This paper presents an arbitrary-shape scene text detection method that can achieve better generalization ability and more accurate localization.  ...  SSC encourages the segmentation of text or non-text in the candidate boxes to be similar to the corresponding ground truth, which is helpful to localize more accurate boundaries for arbitrary-shape scene  ...  The second challenge is to accurately localize the arbitraryshape scene text.  ... 
doi:10.1109/tmm.2021.3073575 fatcat:r4rgthwbk5cw3boixvko3zbpoi

Context-Aware RCNN: A Baseline for Action Detection in Videos [article]

Jianchao Wu, Zhanghui Kuang, Limin Wang, Wayne Zhang, Gangshan Wu
2020 arXiv   pre-print
Our approach can serve as a strong baseline for video action detection and is expected to inspire new ideas for this filed. The code is available at .  ...  Consequently, we develop a surpringly effective baseline (Context-Aware RCNN) and it achieves new state-of-the-art results on two challenging action detection benchmarks of AVA and JHMDB.  ...  This work is supported by SenseTime Research Fund for Young Scholars, the National Science Foundation of China (No. 61921006), Program for Innovative Talents and Entrepreneur in Jiangsu Province, and Collaborative  ... 
arXiv:2007.09861v1 fatcat:cntx4bbblven7jdrwhiwd7hi4y

Snoopertext: A multiresolution system for text detection in complex visual scenes

R. Minetto, N. Thome, M. Cord, J. Fabrizio, B. Marcotegui
2010 2010 IEEE International Conference on Image Processing  
In this paper, we describe a robust and accurate multiresolution approach to detect and classify text regions in such scenarios.  ...  For instance, in an urban context, the detection is very difficult due to large variations in terms of shape, size, color, orientation, and the image may be blurred or have irregular illumination, etc.  ...  Efficient text detection should provide useful information for many applications related to scene understanding. However, no standard efficient solution really emerges on urban context.  ... 
doi:10.1109/icip.2010.5651761 dblp:conf/icip/MinettoTCFM10 fatcat:hkkji6zdvvey7kfwddvhnusxwu
« Previous Showing results 1 — 15 out of 14,975 results