Deriving Orthographic Data from Classical Japanese Texts with Machine- Learning Methods
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This project applies advanced machine-learning techniques to extract orthographic data—specifically jibo 字母, the Chinese character matrices underlying cursive Japanese hiragana—from classical Japanese manuscripts. Inspired by the National Diet Library’s NDLkotenOCR and the Center for Open Data in the Humanities’ (CODH) KuroNet, our aim is to automate the generation of jibo data from manuscript images. This automation enables large-scale orthographic analysis and scribal attribution, which has traditionally required extensive manual effort. By integrating modern computer vision techniques, we seek to create a robust pipeline that identifies jibo to facilitate deeper linguistic and historical insights into classical Japanese texts.
Description
published in JINMONKON 2025,
Proceedings of the annual conference of the Computers and Humanities Special Interest Group of the Information Processing Society of Japan (IPSJ), December, 2025.
