Evaluation of the Stability of Four Document Segmentation Algorithms

Sebastien Eskenazi, Petra Gomez-Kramer, Jean-Marc Ogier
2016 2016 12th IAPR Workshop on Document Analysis Systems (DAS)  
The importance of having stable information extraction algorithms for security related applications and more generally for industrial use cases has been recently highlighted. Stability is what makes an algorithm reliable as it gives a guarantee that the results will be reproducible on similar data. Without it, security criteria such as the probability of false positives cannot be quantified. As a consequence, no security application can be built from an unstable algorithm. In a document
more » ... tion framework, the probability of false positives indicates the probability that two different results are given for two copies of the same document. This paper builds on our previous work about a stable layout descriptor to study the stability of four segmentation algorithms. We consider that a segmentation algorithm is stable if it produces the same layout for all copies of the same document. The algorithms studied are two versions of PAL, Voronoi, and JSEG. We compare the stability of the different algorithms and study the factors influencing their stability.
doi:10.1109/das.2016.25 dblp:conf/das/EskenaziGO16 fatcat:4eo5f3ua4veupdlqqyil3ejupy