Filters








24,457 Hits in 7.6 sec

Predicting the Ordering of Characters in Japanese Historical Documents [article]

Alex Lamb, Tarin Clanuwat, Siyu Han, Mikel Bober-Irizar, Asanobu Kitamoto
2021 arXiv   pre-print
This is because sequence in classical Japanese is very different from modern Japanese. Ordering characters into a sequence is important for making the document text easily readable and searchable.  ...  Our best-performing system has an accuracy of 98.65\% and has a perfect accuracy on 49\% of the books in our dataset, suggesting that the technique is able to predict the order of the characters well enough  ...  Challenges in predicting the order of characters in Japanese historical documents The Kuzushiji writing system was used in Japan for over a thousand years, but due to the standardization of Japanese language  ... 
arXiv:2106.06786v1 fatcat:kclarsc66zerfke4v5ai4r2uci

KuroNet: Regularized Residual U-Nets for End-to-End Kuzushiji Character Recognition

Alex Lamb, Tarin Clanuwat, Asanobu Kitamoto
2020 SN Computer Science  
Thus there has been a great deal of interest in using machine learning to automatically recognize these historical texts and transcribe them into modern Japanese characters.  ...  The result has been datasets with hundreds of millions of photographs of historical documents which can only be read by a small number of specially trained experts.  ...  This was explored for Japanese historical documents specifically by [14] , but using documents between 1870 and 1945 which characters in the their dataset, while old style prints, are not Kuzushiji, but  ... 
doi:10.1007/s42979-020-00186-z fatcat:4e5bdbmvxzfpzagc2a7ayzgpse

KuroNet: Pre-Modern Japanese Kuzushiji Character Recognition with Deep Learning [article]

Tarin Clanuwat, Alex Lamb, Asanobu Kitamoto
2019 arXiv   pre-print
Thus there has been a great deal of interest in using Machine Learning to automatically recognize these historical texts and transcribe them into modern Japanese characters.  ...  The result has been datasets with hundreds of millions of photographs of historical documents which can only be read by a small number of specially trained experts.  ...  This was explored for Japanese historical documents specifically by Le Duc et al. (2018) , but using documents between 1870 and 1945 which characters in the their dataset, while old style prints, are  ... 
arXiv:1910.09433v1 fatcat:ap7u6mnaabfxxdknkwauwynqhe

A human-inspired recognition system for premodern Japanese historical documents [article]

Anh Duc Le, Tarin Clanuwat, Asanobu Kitamoto
2019 arXiv   pre-print
However, in Japanese historical documents, not only contains the mentioned problems, pre-modern Japanese characters were written in cursive and are connected.  ...  In this paper, we propose a human-inspired document reading system to recognize multiple lines of premodern Japanese historical documents.  ...  The Japanese has been using 3 types of characters which are Kanji (Chinese character in Japanese language), Hiragana and Katakana.  ... 
arXiv:1905.05377v1 fatcat:yw7etdq6efccflm2tpvb7cqnwa

Automated Transcription for Pre-Modern Japanese Kuzushiji Documents by Random Lines Erasure and Curriculum Learning [article]

Anh Duc Le
2020 arXiv   pre-print
Recognizing the full-page of Japanese historical documents is a challenging problem due to the complex layout/background and difficulty of writing styles, such as cursive and connected characters.  ...  In this paper, we enlarge our previous humaninspired recognition system from multiple lines to the full-page of Kuzushiji documents.  ...  Conclusion In this paper, we have proposed the random text line erasure for data generation and training the human-inspired recognition system for full-page of Japanese historical documents by curriculum  ... 
arXiv:2005.02669v1 fatcat:6kerpbee65d27pkecbq2y2xis4

A Large Dataset of Historical Japanese Documents with Complex Layouts [article]

Zejiang Shen, Kaixuan Zhang, Melissa Dell
2020 arXiv   pre-print
To this end, we present HJDataset, a Large Dataset of Historical Japanese Documents with Complex Layouts. It contains over 250,000 layout element annotations of seven types.  ...  In addition to bounding boxes and masks of the content regions, it also includes the hierarchical structures and reading orders for layout elements.  ...  This project is supported in part by NSF Grant #1823616.  ... 
arXiv:2004.08686v1 fatcat:lvqb5xv55nesrfstthtk2tcdr4

Automatic Processing of Historical Japanese Mathematics (Wasan) Documents

Yago Diez, Toya Suzuki, Marius Vila, Katsushi Waki
2021 Applied Sciences  
As our database is made up of manual scans of real historical documents, it presents scanning artifacts in the form of image noise and page misalignment.  ...  We pay special attention to the results concerning one particular kanji character, the "ima" kanji, as it is of special importance for the interpretation of Wasan documents.  ...  In order to avoid deleting small parts of Kanji characters, the document starts with a Erosion operation to get all the strokes in each Kanji to be connected.  ... 
doi:10.3390/app11178050 fatcat:v35hp7dco5fxhjlix5cxrvkkdq

Recognition of Anomalously Deformed Kana Sequences in Japanese Historical Documents

Nam Tuan LY, Kha Cong NGUYEN, Cuong Tuan NGUYEN, Masaki NAKAGAWA
2019 IEICE transactions on information and systems  
This paper presents recognition of anomalously deformed Kana sequences in Japanese historical documents, for which a contest was held by IEICE PRMU 2017.  ...  3: unrestricted sets of characters composed of three or more characters possibly in multiple lines.  ...  Kitamoto and his group at National Institute of Informatics in Japan and the committee members of the IEICE PRMU contest for preparing the datasets and leading this contest.  ... 
doi:10.1587/transinf.2018edp7361 fatcat:fambt3but5b2zmlwdcurgvvx7a

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis [article]

Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, Weining Li
2021 arXiv   pre-print
Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks.  ...  for challenges in the domain of DIA.  ...  This project is supported in part by NSF Grant OIA-2033558 and funding from the Harvard Data Science Initiative and Harvard Catalyst. Zejiang Shen thanks Doug Downey for suggestions.  ... 
arXiv:2103.15348v2 fatcat:7pz575jey5g63odk7axh7dpjzm

Deep Learning for Historical Document Analysis and Recognition—A Survey

Francesco Lombardi, Simone Marinai
2020 Journal of Imaging  
Nowadays, deep learning methods are employed in a broad range of research fields. The analysis and recognition of historical documents, as we survey in this work, is not an exception.  ...  Our study analyzes the papers published in the last few years on this topic from different perspectives: we first provide a pragmatic definition of historical documents from the point of view of the research  ...  An accuracy of 95% has been reached for the character segmentation task, tested on real images of Japanese historical handwritten official documents.  ... 
doi:10.3390/jimaging6100110 pmid:34460551 pmcid:PMC8321201 fatcat:nevh2ctshzfwtey4girgjtaftq

Cross-Lingual and Cross-Chronological Information Access to Multilingual Historical Documents [chapter]

Biligsaikhan Batjargal
2018 Multilingualism and Bilingualism  
Nowadays, digital collections of historical documents have to handle materials written in many different languages in different time periods.  ...  The proposed method performs computerized analysis on Mongolian historical documents. Named entities such as personal names and place names are extracted by employing support vector machine.  ...  and (2) how the proposed approach can be applied to other languages in order to provide cross-lingual and cross-chronological information access to multilingual historical documents.  ... 
doi:10.5772/intechopen.72421 fatcat:rv4vo56qebddlaok6yvriakzwq

A Method of Japanese Ancient Text Recognition by Deep Learning

Lehan Chen, Bing Lyu, Hiroyuki Tomiyama, Lin Meng
2020 Procedia Computer Science  
In this experiment, the layout of the text is extracted into grayscale image through ARU-Net (a neural pixel labeling machine for historical document layout analysis).  ...  The sensory data are forecasted through a data prediction model in the cloud, and sensory data of an IoT node is necessary to be routed to the cloud for the synchronization purpose, only when the category  ...  In this experiment, ARU-Net (a neural pixel labeling machine for historical document layout analysis) is used to extract the layout of the texts from the pages in the images.  ... 
doi:10.1016/j.procs.2020.06.084 fatcat:ujhquxxf7ja4rhzudhoreb7wbm

Graph-Based Keyword Spotting in Historical Documents Using Context-Aware Hausdorff Edit Distance

Michael Stauffer, Andreas Fischer, Kaspar Riesen
2018 2018 13th IAPR International Workshop on Document Analysis Systems (DAS)  
ACKNOWLEDGMENT The authors would like to thank the Siemens Postal, Parcel & Airport Logistics GmbH for funding this work.  ...  In addition, Japanese character recognition requires a large number of training data since thousands of character classes exist in the language.  ...  The training dataset does not include real scene characters since it is difficult to collect characters of all classes in Japanese.  ... 
doi:10.1109/das.2018.31 dblp:conf/das/Stauffer0R18 fatcat:2r2cjpiitfcs5knjtqbfvcuwsi

Survey on Deep Learning-based Kuzushiji Recognition [article]

Kazuya Ueki, Tomoka Kojima
2020 arXiv   pre-print
The high-precision detection and recognition of Kuzushiji, a Japanese cursive script used for transcribing historical documents, has been made possible through the use of deep learning.  ...  Owing to the overwhelming accuracy of the deep learning method demonstrated at the 2012 image classification competition, deep learning has been successfully applied to a variety of other tasks.  ...  In this project, to develop a historical document research support system, the authors studied a 1 Hiragana is one of the three different character sets used in Japanese writing.  ... 
arXiv:2007.09637v1 fatcat:3z4z542ubzhrlmroehh7xtaznq

Naive Bayesian Prediction of Japanese Annotated Corpus for Textual Semantic Word Formation Classification

Zhoushao Hao, Gengxin Sun
2022 Mathematical Problems in Engineering  
In the research on predicting Japanese semantic word formation patterns, this paper builds a semantic word formation pattern prediction model based on Naive Bayes and conducts simulation experiments.  ...  This paper further improves the accuracy of computer prediction of Japanese semantic word formation patterns by adding part of speech.  ...  Naive Bayesian Prediction Bias. In the Naive Bayesian polynomial model, a document is regarded as a series of ordered collections of words.  ... 
doi:10.1155/2022/8048335 fatcat:thmqz7ipyjeq7kwss2gygnn3la
« Previous Showing results 1 — 15 out of 24,457 results