Filters








24 Hits in 1.3 sec

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries [article]

Yuting Zhang, Luyao Yuan, Yijie Guo, Zhiyuan He, I-An Huang, Honglak Lee
2017 arXiv   pre-print
Associating image regions with text queries has been recently explored as a new way to bridge visual and linguistic representations.  ...  We formulate a discriminative bimodal neural network (DBNet), which can be trained by a classifier with extensive use of negative samples.  ...  We thank NVIDIA for donating K40c and TITAN X GPUs. We also thank Kibok Lee, Binghao Deng, Jimei Yang, and Ruben Villegas for helpful discussions.  ... 
arXiv:1704.03944v2 fatcat:2nchaeddgzamvjpxcuo342xyma

CSFF-Net: Scene Text Detection Based on Cross-Scale Feature Fusion

Yuan Li, Mayire Ibrayim, Askar Hamdulla
2021 Information  
In the last years, methods for detecting text in real scenes have made significant progress with an increase in neural networks.  ...  To solve this problem, this paper proposes a scene text detection network based on cross-scale feature fusion (CSFF-Net).  ...  Finally, the 3D-Attention module can be easily embedded into modern classification networks for a wide range of tasks due to its generic nature.  ... 
doi:10.3390/info12120524 fatcat:jv5goglnl5eizk2r4hli3t2xwi

Intelligent Micron Optical Character Recognition of DFB Chip Using Deep Convolutional Neural Network

Xudong Wang, Yebin Li, Juanxiu Liu, Jing Zhang, Xiaohui Du, Lin Liu, Yong Liu
2022 IEEE Transactions on Instrumentation and Measurement  
The microcharacter recognition on the distributed feedback (DFB) laser chip is critically essential but a challenging task for the quality control in the incoming chip inspection and optical device manufacturing  ...  which is also of great significance for further accelerating the development of the industrial Internet.  ...  Chunzhong, a Senior and students of Modern Opto-Electronic Measurement and Instrumental Laboratory (MOEMIL), University of Electronic Science and Technology of China (UESTC), Chengdu, China, for their  ... 
doi:10.1109/tim.2022.3154831 fatcat:dnwqcsm7w5bpzjevpgf7rwveai

Real-time Scene Text Detection Based on Global Level and Word Level Features [article]

Fuqiang Zhao, Jionghua Yu, Enjun Xing, Wenming Song, Xue Xu
2022 arXiv   pre-print
It is an extremely challenging task to detect arbitrary shape text in natural scenes on high accuracy and efficiency.  ...  The word-level label is generated by obtaining the minimum axis-aligned rectangle boxes of the shrunk polygon.  ...  Ground Truth Label Generation The label generation includes the global-level and the word-level label generation. Refer to DBNet for more details on the global-level label generation.  ... 
arXiv:2203.05251v1 fatcat:omdlxnz2nvgsrcwemy6sonezzu

Anomaly Detection in Natural Scene Images based on enhanced Fine-Grained Saliency and Fuzzy Logic

Hamam Mokayed, Palaiahnakote Shivakumara, Rajkumar Saini, Marcus Liwicki, Loo Chee Hin, Umapada Pal
2021 IEEE Access  
This work considers such misclassified components, which are part of the text as anomalies, and presents a new idea for detecting such anomalies in the text for improving text detection and recognition  ...  a fuzzy-based classifier.  ...  CONCLUSION AND FUTURE WORK This paper proposes a new method for detecting anomalies in text detection results generated by text detection methods.  ... 
doi:10.1109/access.2021.3103279 fatcat:hc6acq2hm5airiau7pcoulzxzu

Video Object Segmentation with Language Referring Expressions [article]

Anna Khoreva, Anna Rohrbach, Bernt Schiele
2019 arXiv   pre-print
Most state-of-the-art semi-supervised video object segmentation methods rely on a pixel-accurate mask of a target object provided for the first frame of a video.  ...  Leveraging recent advances of language grounding models designed for images, we propose an approach to extend them to video data, ensuring temporally coherent predictions.  ...  DBNet benefits more than MattNet from Oracle boxes, showing its higher potential to generalize to a new domain given better proposals.  ... 
arXiv:1803.08006v3 fatcat:qzv4vpl4ojap3lriyexugtycby

Real-time Naive Learning of Neural Correlates in ECoGElectrophysiology

Zachary V. Freudenburg, Nicolas F. Ramsey, Mark Wronkiewicz, William D. Smart, Robert Pless, Eric C. Leuthardt
2011 International Journal of Machine Learning and Computing  
Electrocorticography (ECoG) is an emerging signal platform for long term implantation of a brain signal recording device, but current approaches rely heavily on screening tasks and trained technicians  ...  We report on the development of a real-time feedback system we call the "Brain Mirror" which is based on the real time, incremental learning of a Deep Belief Network.  ...  However, the generally poor results of the onlinePCA representation and impressive results of the DBNet representation indicate that the DBNet algorithm is much better suited for online training.  ... 
doi:10.7763/ijmlc.2011.v1.40 fatcat:avif4wxm3rdy3ozrreo7wmgxva

DiT: Self-supervised Pre-training for Document Image Transformer [article]

Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei
2022 arXiv   pre-print
We leverage DiT as the backbone network in a variety of vision-based Document AI tasks, including document image classification, document layout analysis, table detection as well as text detection for  ...  Experiment results have illustrated that the self-supervised pre-trained DiT model achieves new state-of-the-art results on these downstream tasks, e.g. document image classification (91.11 → 92.69), document  ...  In addition, DBNet [30] is a widely used text detection model for online OCR engines, we also fine-tune a pre-trained DBNet model with FUNSD training data and evaluate its accuracy.  ... 
arXiv:2203.02378v3 fatcat:yiykzkwwq5eafmlh6hqmnbmdl4

Review: Deep Learning on 3D Point Clouds

Saifullahi Aminu Bello, Shangshu Yu, Cheng Wang, Jibril Muhmmad Adam, Jonathan Li
2020 Remote Sensing  
Contrary to existing reviews, this paper provides a general structure for learning with raw point clouds, and various methods were compared based on the general structure.  ...  Deep learning is now the most powerful tool for data processing in computer vision and is becoming the most preferred technique for tasks such as classification, segmentation, and detection.  ...  It has created a new benchmark dataset for 3D shape mobility analysis.  ... 
doi:10.3390/rs12111729 fatcat:hitwxgdfp5avrdaqrjxlialwou

Recent Advances in Vision-Based On-Road Behaviors Understanding: A Critical Survey

Rim Trabelsi, Redouane Khemmar, Benoit Decoux, Jean-Yves Ertaud, Rémi Butteau
2022 Sensors  
a comprehensive understanding of approaches and techniques.  ...  On-road behavior analysis is a crucial and challenging problem in the autonomous driving vision-based area.  ...  Aside from a few number of works that did reuse these metrics [16, 48] , new criteria have been proposed by the RBA community to discern the performance of a given model designed for a new tasks and/or  ... 
doi:10.3390/s22072654 pmid:35408269 pmcid:PMC9003377 fatcat:2vrmgz3b25eyxbijeurx5aijv4

Scene recognition under special traffic conditions based on deep multi-task learning

Xiaochang Hu, Xin Xu, Yongqian Xiao, Hongjun Chen, Hongjia Zhang
2020 The Journal of Engineering  
This study presents a deep multi-task classification framework for scene recognition involving special traffic conditions.  ...  The four tasks share the feature map generated by a convolutional neural network followed by task-specific sub-networks which are merged in the end via a joint loss function.  ...  Therefore, a new dataset for special traffic scenes is of necessity and was built up in our work.  ... 
doi:10.1049/joe.2019.1191 fatcat:25qvpe2ur5ftdijk6qvf4kr3wq

YOLOv3_ReSAM: A Small-Target Detection Method

Bailin Liu, Huan Luo, Haotong Wang, Shaoxu Wang
2022 Electronics  
In order to eliminate the loss of spatial feature information and hierarchical information caused by pooling operations in convolution processes and multi-scale operations in multi-layer structures, a  ...  spatial attention mechanism based on residual structure is proposed.  ...  new object as the new cluster center.  ... 
doi:10.3390/electronics11101635 fatcat:vbrmjxdijffi3lyzp6biz7vjza

RFRN: A Recurrent Feature Refinement Network for Accurate and Efficient Scene Text Detection

Guanyu Deng, Yue Ming, Jing-Hao Xue
2020 Neurocomputing  
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version  ...  The second challenge is the demand for a good balance between the inference efficiency and accuracy.  ...  It produces the fused feature maps for predicting the final output masks. We inherit the feature pyramid structure that efficiently refines the context information with skip connection.  ... 
doi:10.1016/j.neucom.2020.10.099 fatcat:22acjwbvqvcd3jqju464sqvgcq

On the Arbitrary-Oriented Object Detection: Classification based Approaches Revisited [article]

Xue Yang, Junchi Yan
2022 arXiv   pre-print
Accordingly, we transform the angular prediction task from a regression problem to a classification one.  ...  For the resulting circularly distributed angle classification problem, we first devise a Circular Smooth Label technique to handle the periodicity of angle and increase the error tolerance to adjacent  ...  To verify its usefulness, we annotate and release a new dataset for this purpose and perform detection evaluation for both rotation and heading with a considerable amount, and more stringent evaluation  ... 
arXiv:2003.05597v4 fatcat:nj6io3aoy5bbtdxammatfauzpy

Review: deep learning on 3D point clouds [article]

Saifullahi Aminu Bello, Shangshu Yu, Cheng Wang
2020 arXiv   pre-print
Deep learning is now the most powerful tool for data processing in computer vision, becoming the most preferred technique for tasks such as classification, segmentation, and detection.  ...  While deep learning techniques are mainly applied to data with a structured grid, point cloud, on the other hand, is unstructured.  ...  In [106] , PointNet and PointNet++ are used to designed a generative shape proposal network to generate proposals which are further processed using PointNet for classification and segmentation.  ... 
arXiv:2001.06280v1 fatcat:mv37i5rb6jfxjak4eyfhspai6u
« Previous Showing results 1 — 15 out of 24 results