A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit <a rel="external noopener" href="https://hal.archives-ouvertes.fr/hal-01860368/file/icdar-ost_FINAL_certified.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
Massive, Free and Reproducible Grountruthed Document Image Databases Generation with DocCreator
<span title="">2017</span>
<i title="IEEE">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/bytixui4szbgzkqzgwf65jphty" style="color: black;">2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)</a>
</i>
Whether your research is focused on image restoration, layout analysis, text-graphic separation, binarization, OCR, etc. you need a groundtruthed database to train your method or to evaluate it. This article presents DocCreator, a multi-platform and open-source software able to create many synthetic image documents with controlled groundtruth. With DocCreator, you can create complete synthetic images choosing the text, font, background and layout to use, add various realistic degradations
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/icdar.2017.188">doi:10.1109/icdar.2017.188</a>
<a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/icdar/JournetMV17.html">dblp:conf/icdar/JournetMV17</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/hhwhfkbq4fgcloj74dzwqe2gly">fatcat:hhwhfkbq4fgcloj74dzwqe2gly</a>
</span>
more »
... through, light defect, paper deformation, ink degradation, etc.) on original images, or combine both to increase the size of your database. DocCreator comes as an online (easy to test version) and a desktop solution (fast calculation process, and no need to upload copyrighted data). DocCreator is useful for retraining tasks and to know precisely whether your algorithm is robust. It has already been used favorably and could help other DIAR researchers to produce and share groundtruthed databases.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201108121150/https://hal.archives-ouvertes.fr/hal-01860368/file/icdar-ost_FINAL_certified.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/09/03/090323af51a8dbf4469c83fc19468685d099f688.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/icdar.2017.188">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="external alternate icon"></i>
ieee.com
</button>
</a>