Skip to content

Machine Learning for Japanese Script kuzushiji

Did you know that most Japanese people cannot read texts dating back more than 150 years ago? Pity, isn’t it?

Those texts, of which there are billions of pages preserved, are written in a script called kuzushiji. With the modernisation of the elementary school in 1900, this script was removed from the curriculum. So now it can only be read by only a number of people, mostly with PhDs in classical Japanese literature and Japanese history. 

Here comes the interesting bit. If Kuzushiji is converted to the modern script, kanji, it becomes approachable by most people fluent in Japanese, albeit with some difficulties due to changes in grammar and vocabulary (if compared to English, more difficult than reading Shakespeare). 

To have it converted by professionals would have been too expensive. But for machine learning, it is just the right task. Challenging, but with great prospects as vast volumes of text can be recognised in seconds. See the article for more details. 

In case you wonder what the Japanese (modern) writing system is like, here’s an excellent explanatory video with examples of what it would look like in English (pic attached).

There are three different systems, actually, and non space between characters! So they have 

  • Hiragana – for function words like particles, auxiliaries,
  • Katakana – for foreign and modern loan words, as well as emotions and slang
  • Kanji – for content-heavy words like nouns, verbs, adjectives.

Amazing, isn’t it?

Photo by Yifeng Lu on Unsplash

Leave a Reply

Your email address will not be published. Required fields are marked *