Filters








15 Hits in 0.81 sec

KuroNet: Pre-Modern Japanese Kuzushiji Character Recognition with Deep Learning [article]

Tarin Clanuwat, Alex Lamb, Asanobu Kitamoto
2019 arXiv   pre-print
An approach specifically for character spotting using U-Nets was proposed in (Clanuwat et al., 2018b) .  ...  Overall it has been estimated that there are over 3 million books preserved nationwide (Clanuwat et al., 2018a) .  ... 
arXiv:1910.09433v1 fatcat:ap7u6mnaabfxxdknkwauwynqhe

Ukiyo-e Analysis and Creativity with Attribute and Geometry Annotation [article]

Yingtao Tian, Tarin Clanuwat, Chikahiko Suzuki, Asanobu Kitamoto
2021 arXiv   pre-print
The study of Ukiyo-e, an important genre of pre-modern Japanese art, focuses on the object and style like other artwork researches. Such study has benefited from the renewed interest by the machine learning community in culturally important topics, leading to interdisciplinary works including collections of images, quantitative approaches, and machine learning-based creativities. They, however, have several drawbacks, and it remains challenging to integrate these works into a comprehensive
more » ... To bridge this gap, we propose a holistic approach We first present a large-scale Ukiyo-e dataset with coherent semantic labels and geometric annotations, then show its value in a quantitative study of Ukiyo-e paintings' object using these labels and annotations. We further demonstrate the machine learning methods could help style study through soft color decomposition of Ukiyo-e, and finally provides joint insights into object and style by composing sketches and colors using colorization. Dataset available at https://github.com/rois-codh/arc-ukiyoe-faces
arXiv:2106.02267v1 fatcat:qmquntburngizkraabcehp2kvy

A human-inspired recognition system for premodern Japanese historical documents [article]

Anh Duc Le, Tarin Clanuwat, Asanobu Kitamoto
2019 arXiv   pre-print
Recognition of historical documents is a challenging problem due to the noised, damaged characters and background. However, in Japanese historical documents, not only contains the mentioned problems, pre-modern Japanese characters were written in cursive and are connected. Therefore, character segmentation based methods do not work well. This leads to the idea of creating a new recognition system. In this paper, we propose a human-inspired document reading system to recognize multiple lines of
more » ... remodern Japanese historical documents. During the reading, people employ eyes movement to determine the start of a text line. Then, they move the eyes from the current character/word to the next character/word. They can also determine the end of a line or skip a figure to move to the next line. The eyes movement integrates with visual processing to operate the reading process in the brain. We employ attention-based encoder-decoder to implement this recognition system. First, the recognition system detects where to start a text line. Second, the system scans and recognize character by character until the text line is completed. Then, the system continues to detect the start of the next text line. This process is repeated until reading the whole document. We tested our human-inspired recognition system on the pre-modern Japanese historical document provide by the PRMU Kuzushiji competition. The results of the experiments demonstrate the superiority and effectiveness of our proposed system by achieving Sequence Error Rate of 9.87% and 53.81% on level 2 and level 3 of the dataset, respectively. These results outperform to any other systems participated in the PRMU Kuzushiji competition.
arXiv:1905.05377v1 fatcat:yw7etdq6efccflm2tpvb7cqnwa

Predicting the Ordering of Characters in Japanese Historical Documents [article]

Alex Lamb, Tarin Clanuwat, Siyu Han, Mikel Bober-Irizar, Asanobu Kitamoto
2021 arXiv   pre-print
Japan is a unique country with a distinct cultural heritage, which is reflected in billions of historical documents that have been preserved. However, the change in Japanese writing system in 1900 made these documents inaccessible for the general public. A major research project has been to make these historical documents accessible and understandable. An increasing amount of research has focused on the character recognition task and the location of characters on image, yet less research has
more » ... used on how to predict the sequential ordering of the characters. This is because sequence in classical Japanese is very different from modern Japanese. Ordering characters into a sequence is important for making the document text easily readable and searchable. Additionally, it is a necessary step for any kind of natural language processing on the data (e.g. machine translation, language modeling, and word embeddings). We explore a few approaches to the task of predicting the sequential ordering of the characters: one using simple hand-crafted rules, another using hand-crafted rules with adaptive thresholds, and another using a deep recurrent sequence model trained with teacher forcing. We provide a quantitative and qualitative comparison of these techniques as well as their distinct trade-offs. Our best-performing system has an accuracy of 98.65\% and has a perfect accuracy on 49\% of the books in our dataset, suggesting that the technique is able to predict the order of the characters well enough for many tasks.
arXiv:2106.06786v1 fatcat:kclarsc66zerfke4v5ai4r2uci

KaoKore: A Pre-modern Japanese Art Facial Expression Dataset [article]

Yingtao Tian, Chikahiko Suzuki, Tarin Clanuwat, Mikel Bober-Irizar, Alex Lamb, Asanobu Kitamoto
2020 arXiv   pre-print
From classifying handwritten digits to generating strings of text, the datasets which have received long-time focus from the machine learning community vary greatly in their subject matter. This has motivated a renewed interest in building datasets which are socially and culturally relevant, so that algorithmic research may have a more direct and immediate impact on society. One such area is in history and the humanities, where better and relevant machine learning models can accelerate research
more » ... across various fields. To this end, newly released benchmarks and models have been proposed for transcribing historical Japanese cursive writing, yet for the field as a whole using machine learning for historical Japanese artworks still remains largely uncharted. To bridge this gap, in this work we propose a new dataset KaoKore which consists of faces extracted from pre-modern Japanese artwork. We demonstrate its value as both a dataset for image classification as well as a creative and artistic dataset, which we explore using generative models. Dataset available at https://github.com/rois-codh/kaokore
arXiv:2002.08595v1 fatcat:ubuprx3zkzfl3pefbaaeazn6uq

KuroNet: Regularized Residual U-Nets for End-to-End Kuzushiji Character Recognition

Alex Lamb, Tarin Clanuwat, Asanobu Kitamoto
2020 SN Computer Science  
Our proposed model KuroNet (which builds on Clanuwat et al. in International conference on document analysis and recognition (ICDAR), 2019) outperforms other model for Kuzushiji recognition.  ... 
doi:10.1007/s42979-020-00186-z fatcat:4e5bdbmvxzfpzagc2a7ayzgpse

Deep Learning for Classical Japanese Literature [article]

Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, David Ha
2018 pre-print
Much of machine learning research focuses on producing models which perform well on benchmark tasks, in turn improving our understanding of the challenges associated with those tasks. From the perspective of ML researchers, the content of the task itself is largely irrelevant, and thus there have increasingly been calls for benchmark tasks to more heavily focus on problems which are of social or cultural relevance. In this work, we introduce Kuzushiji-MNIST, a dataset which focuses on Kuzushiji
more » ... (cursive Japanese), as well as two larger, more challenging datasets, Kuzushiji-49 and Kuzushiji-Kanji. Through these datasets, we wish to engage the machine learning community into the world of classical Japanese literature. Dataset available at https://github.com/rois-codh/kmnist
doi:10.20676/00000341 arXiv:1812.01718v1 fatcat:hjfmg2lstnahhjgz3h7vlc3abq

Kannada-MNIST: A new handwritten digits dataset for the Kannada language

Vinay Uday Prabhu
2019 Zenodo  
Last but not the least, we'd like to acknowledge the helpful advice shared by the authors of the K-MNIST paper, Tarin Clanuwat, Alex Lamb(Mila) and David Ha (Google Brain).  ...  Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. 2017 17 Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, and David Ha.  ... 
doi:10.5281/zenodo.3359689 fatcat:vwdbmyvc6ng5jgjy7phsjj6hzu

MTL2L: A Context Aware Neural Optimiser [article]

Nicholas I-Hsien Kuo, Mehrtash Harandi, Nicolas Fourrier, Christian Walder, Gabriela Ferraro, Hanna Suominen
2020 arXiv   pre-print
Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, and David Ha. Deep Learning for Classical Japanese Literature.  ...  ., 2017) , KMNIST (Clanuwat et al., 2018) , and Cifar10 (Krizhevsky, 2009).  ... 
arXiv:2007.09343v1 fatcat:3rkm2737gzaf5ao6ieorlalnf4

Model Weight Theft With Just Noise Inputs: The Curious Case of the Petulant Attacker [article]

Nicholas Roberts, Vinay Uday Prabhu, Matthew McAteer
2019 arXiv   pre-print
[2] Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, and David Ha. Deep learning for classical japanese literature, 2018.  ... 
arXiv:1912.08987v1 fatcat:74c6iottwzg77khpcuzjqoupwe

Kannada-MNIST: A new handwritten digits dataset for the Kannada language [article]

Vinay Uday Prabhu
2019 arXiv   pre-print
Last but not the least, we'd like to acknowledge the helpful advice shared by the authors of the K-MNIST paper, Tarin Clanuwat, Alex Lamb(Mila) and David Ha (Google Brain).  ... 
arXiv:1908.01242v1 fatcat:7kirtkmagbhqdob66nqedhlr2i

Enhancing Language Inclusivity in Digital Humanities: Towards Sensitivity and Multilingualism

Aliz Horvath
2021 Modern Languages Open  
s KuLA (Kuzushiji Learning Application) and tarin Clanuwat et al.'  ...  "the Kuzushiji project" and Clanuwat et al.  ... 
doi:10.3828/mlo.v0i0.382 fatcat:f7v6cdsbtfdelixtf5hdcul7bm

Implicit Regularization via Neural Feature Alignment [article]

Aristide Baratin, Thomas George, César Laurent, R Devon Hjelm, Guillaume Lajoie, Pascal Vincent, Simon Lacoste-Julien
2021 arXiv   pre-print
Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kita- moto, Alex Lamb, Kazuaki Yamamoto, and David Ha. Deep learning for classical japanese literature. 2018.  ...  Our setup is to augment 10.000 MNIST training examples with 1000 difficult examples of 2 types: (i ) examples with random labels and (ii ) examples from the dataset KMNIST (Clanuwat et al., 2018).  ... 
arXiv:2008.00938v3 fatcat:xtcsbf4kcnbn3itixjq3ddrwhy

Depth Uncertainty in Neural Networks [article]

Javier Antorán, James Urquhart Allingham, José Miguel Hernández-Lobato
2020 arXiv   pre-print
Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, and David Ha. Deep learning for classical japanese literature, 2018. Marc Deisenroth and Jun Wei Ng.  ...  ., 2017) • KMNIST (Clanuwat et al., 2018) • CIFAR10/100 (Krizhevsky et al., 2009) and Corrupted CIFAR (Hendrycks and Dietterich, 2019) • SVHN (Netzer et al., 2011) Figure 12 : 12 Fit obtained by a GP  ... 
arXiv:2006.08437v3 fatcat:7s7rwleekfbpvbmm4miqbntwry

Learning from Aggregate Observations [article]

Yivan Zhang, Nontawat Charoenphakdee, Zhenguo Wu, Masashi Sugiyama
2021 arXiv   pre-print
[13] Tarin Clanuwat, Mikel Bober-Irizar, Asanobu Kitamoto, Alex Lamb, Kazuaki Yamamoto, and David Ha. Deep learning for classical japanese literature, 2018.  ... 
arXiv:2004.06316v3 fatcat:57qnfy25gzdulf6tkw6mixsyhu