5,407 Hits in 3.1 sec

A Survey of Graphical Page Object Detection with Deep Neural Networks

Jwalin Bhatt, Khurram Azeem Hashmi, Muhammad Zeshan Afzal, Didier Stricker
2021 Applied Sciences  
Therefore, we discuss the most relevant deep learning-based approaches and state-of-the-art graphical page object detection in document images.  ...  This work outlines and summarizes the deep learning approaches for detecting graphical page objects in document images.  ...  graphical page objects in document images as an object detection problem.  ... 
doi:10.3390/app11125344 fatcat:5ivlv3n42fhkbmoecyrkmoheqy

A Page Object Detection Method Based on Mask R-CNN

Canhui Xu, Cao Shi, Hengyue Bi, Chuanqi Liu, Yongfeng Yuan, Haoyan Guo, Yinong Chen
2021 IEEE Access  
In this study, block level region object detection is considered among the inherent hierarchical structure for document images.  ...  Page object detection is crucial for document understanding. Different granularities for objects can result in different performances.  ...  Inspired by previous works, in this paper, we utilized Mask R-CNN architecture on document image page object detection.  ... 
doi:10.1109/access.2021.3121152 fatcat:4vm2tkapmjfpdcte4lpcxjfyua

Graphical Object Detection in Document Images [article]

Ranajit Saha and Ajoy Mondal and C. V. Jawahar
2020 arXiv   pre-print
In this paper, we present a novel end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection (GOD).  ...  The GOD explores the concept of transfer learning and domain adaptation to handle scarcity of labeled training images for graphical object detection task in the document images.  ...  We test the abilities of Faster R-CNN and Mask R-CNN, originally built for the natural scene images, to cope with detecting graphical objects in the document images.  ... 
arXiv:2008.10843v1 fatcat:4mmtewva3fhh5bnkjeyvksrf4q

IIIT-AR-13K: A New Dataset for Graphical Object Detection in Documents [article]

Ajoy Mondal, Peter Lipps, C. V. Jawahar
2020 arXiv   pre-print
This dataset contains a total of 13k annotated page images with objects in five different popular categories - table, figure, natural image, logo, and signature.  ...  We introduce a new dataset for graphical object detection in business documents, more specifically annual reports.  ...  [14] propose saliency based technique to detect three types of graphical objects -table, figure and mathematical equation. In [21] , mask r-cnn is explored to detect various graphical objects.  ... 
arXiv:2008.02569v1 fatcat:ziunzdryh5fgtkcwrbyqpwt23y

Vision-Based Layout Detection from Scientific Literature using Recurrent Convolutional Neural Networks [article]

Huichen Yang, William H. Hsu
2020 arXiv   pre-print
We consider scientific document layout analysis as an object detection task over digital images, without any additional text features that need to be added into the network during the training process.  ...  In this paper, we present a novel approach to developing an end-to-end learning framework to segment and classify major regions of a scientific document.  ...  embedding [23] ) but document page image for input.  ... 
arXiv:2010.11727v1 fatcat:uvluhq4y2ncslgibfva2dxyz7e

Fine-Grained Object Detection over Scientific Document Images with Region Embeddings [article]

Ankur Goswami, Joshua McGrath, Shanan Peters, Theodoros Rekatsinas
2019 arXiv   pre-print
We study the problem of object detection over scanned images of scientific documents.  ...  We find that current object detectors fail to produce properly localized region proposals over such page objects.  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views, policies, or endorsements, either expressed or implied  ... 
arXiv:1910.12462v2 fatcat:jfltljzj75ca7h7exoqm7isofm

ScanSSD: Scanning Single Shot Detector for Mathematical Formulas in PDF Document Images [article]

Parag Mali, Puneeth Kukkadapu, Mahshad Mahdavi, Richard Zanibbi
2020 arXiv   pre-print
Given a 600 dpi document page image, a Single Shot Detector (SSD) locates formulas at multiple scales using sliding windows, after which candidate detections are pooled to obtain page-level results.  ...  ScanSSD detects characters in formulas with high accuracy, obtaining a 0.926 f-score, and detects formulas with high recall overall.  ...  This material is based upon work supported by the Alfred P. Sloan  ... 
arXiv:2003.08005v1 fatcat:fx4bieqkgrgzpbdi7pe2kllrk4

Fi-Fo Detector: Figure and Formula Detection Using Deformable Networks

Junaid Younas, Shoaib Ahmed Siddiqui, Mohsin Munir, Muhammad Imran Malik, Faisal Shafait, Paul Lukowicz, Sheraz Ahmed
2020 Applied Sciences  
The proposed approach is evaluated on a publicly available ICDAR-2017 Page Object Detection (POD) dataset and its corrected version.  ...  It produces the state-of-the-art results for formula and figure detection in document images with an f1-score of 0.954 and 0.922, respectively.  ...  [20] presented the most recent method for page object detection in document images. Their approach is based on mask-RCNN for figure, formula, and table detection in document images.  ... 
doi:10.3390/app10186460 doaj:09dae57db0dd4a36a644a5fa6eeda3b3 fatcat:kwpqxa4aafdrhd4crg7bdbqqqy

Visual Detection with Context for Document Layout Analysis

Carlos Soto, Shinjae Yoo
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
To address this, we adapt the object-detection technique Faster R-CNN for document layout detection, incorporating contextual information that leverages the inherently localized nature of article contents  ...  We present 1) a work in progress method to visually segment key regions of scientific articles using an object detection technique augmented with contextual features, and 2) a novel dataset of region-labeled  ...  Acknowledgments This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, DE-SC0012704 and BNL LDRD #17-018.  ... 
doi:10.18653/v1/d19-1348 dblp:conf/emnlp/SotoY19 fatcat:ovn3bqlhlrdgjmrxyor32hv254

A Large Dataset of Historical Japanese Documents with Complex Layouts [article]

Zejiang Shen, Kaixuan Zhang, Melissa Dell
2020 arXiv   pre-print
Deep learning-based approaches for automatic document layout analysis and content extraction have the potential to unlock rich information trapped in historical documents on a large scale.  ...  In particular, little training data exist for Asian languages. To this end, we present HJDataset, a Large Dataset of Historical Japanese Documents with Complex Layouts.  ...  This project is supported in part by NSF Grant #1823616.  ... 
arXiv:2004.08686v1 fatcat:lvqb5xv55nesrfstthtk2tcdr4

TNCR: Table Net Detection and Classification Dataset [article]

Abdelrahman Abdallah, Alexander Berendeyev, Islam Nuradin, Daniyar Nurseitov
2021 arXiv   pre-print
The TNCR dataset can be used for table detection in scanned document images and their classification into 5 different classes. TNCR contains 9428 high-quality labeled images.  ...  In this paper, we have implemented state-of-the-art deep learning-based methods for table detection to create several strong baselines.  ...  CNNs for object detection have been implemented widely in document analysis and image processing [45, 7, 46, 3] .  ... 
arXiv:2106.15322v1 fatcat:g4h6wgjb4ffzponytmm3jtzqmy

VTLayout: Fusion of Visual and Text Features for Document Layout Analysis [article]

Shoubin Li, Xuyan Ma, Shuaiqun Pan, Jun Hu, Lin Shi, Qing Wang
2021 arXiv   pre-print
Although many deep-learning-based methods from computer vision have already achieved excellent performance in detecting Figure from documents, they are still unsatisfactory in recognizing the List, Table  ...  In the first stage, the Cascade Mask R-CNN model is applied directly to localize all category blocks of the documents.  ...  Although the Figure, List, and Table in DLA are different from the objects in traditional object detection tasks, some deep-learning-based models can still perform well, such as Faster R-CNN [22] , Mask  ... 
arXiv:2108.13297v1 fatcat:f3lgq2swqnduxmzpnh6f7g6va4

Automatic CNN-Based Arabic Numeral Spotting and Handwritten Digit Recognition by Using Deep Transfer Learning in Ottoman Population Registers

Yekta Said Can, M. Erdem Kabadayı
2020 Applied Sciences  
We first used a CNN-based segmentation method for spotting these numerals.  ...  Page segmentation (layout analysis), keyword, number and symbol spotting, handwritten text recognition algorithms are tested on historical documents.  ...  These are widely employed for detecting objects in different image processing applications [39] .  ... 
doi:10.3390/app10165430 fatcat:mxj2ep5cbjesrhwaga35mmcgzq

The Benefits of Close-Domain Fine-Tuning for Table Detection in Document Images [article]

Ángela Casado-García and César Domínguez and Jónathan Heras and Eloy Mata and Vico Pascual
2019 arXiv   pre-print
In this context, such a technique exports the knowledge acquired to detect objects in natural images to detect tables in document images.  ...  To this aim, we train different object detection algorithms (namely, Mask R-CNN, RetinaNet, SSD and YOLO) using the TableBank dataset (a dataset of images of academic documents designed for table detection  ...  objects in document images.  ... 
arXiv:1912.05846v1 fatcat:mp7oqswqondrlavpxeovimcjyy

Layout Aware Semantic Element Extraction for Sustainable Science & Technology Decision Support

Hyuntae Kim, Jongyun Choi, Soyoung Park, Yuchul Jung
2022 Sustainability  
Therefore, this paper proposes LA-SEE (LAME and Vi-SEE), a knowledge graph construction framework that simultaneously extracts meta-information and useful image objects from S&T documents in various layout  ...  In addition, to illustrate the potential power of our SEKG, we provide two promising application scenarios, such as a scientific knowledge guide across multiple S&T documents and questions and answering  ...  Figure 3 3 Figure 3 describes the proposed Vi-SEE model, which utilizes ISTR [36] to detect objects in pages in the PDF document except for the first page.  ... 
doi:10.3390/su14052802 fatcat:eew4bb5q55ccpavk6yxroogsgq
« Previous Showing results 1 — 15 out of 5,407 results