Polygon-free: Unconstrained Scene Text Detection with Box Annotations [article]

Weijia Wu, Enze Xie, Ruimao Zhang, Wenhai Wang, Hong Zhou, Ping Luo
2022 arXiv   pre-print
Although a polygon is a more accurate representation than an upright bounding box for text detection, the annotations of polygons are extremely expensive and challenging. Unlike existing works that employ fully-supervised training with polygon annotations, this study proposes an unconstrained text detection system termed Polygon-free (PF), in which most existing polygon-based text detectors (e.g., PSENet [33],DB [16]) are trained with only upright bounding box annotations. Our core idea is to
more » ... ansfer knowledge from synthetic data to real data to enhance the supervision information of upright bounding boxes. This is made possible with a simple segmentation network, namely Skeleton Attention Segmentation Network (SASN), that includes three vital components (i.e., channel attention, spatial attention and skeleton attention map) and one soft cross-entropy loss. Experiments demonstrate that the proposed Polygonfree system can combine general detectors (e.g., EAST, PSENet, DB) to yield surprisingly high-quality pixel-level results with only upright bounding box annotations on a variety of datasets (e.g., ICDAR2019-Art, TotalText, ICDAR2015). For example, without using polygon annotations, PSENet achieves an 80.5% F-score on TotalText [3] (vs. 80.9% of fully supervised counterpart), 31.1% better than training directly with upright bounding box annotations, and saves 80%+ labeling costs. We hope that PF can provide a new perspective for text detection to reduce the labeling costs. The code can be found at https://github.com/weijiawu/Unconstrained-Text-Detection-with-Box-Supervisionand-Dynamic-Self-Training.
arXiv:2011.13307v3 fatcat:xwzeylglofckrmarmiy3ru44zu