Announcement: 25.11.2025 – Lecture (in-site) by Miguel Escobar Varela: "Towards machine-readable Jawi Newspapers using bespoke AI models"
25. November 2025, 16:00 Uhr, von AAI Webmaster

Foto: "Warta Malaya", cropped
We kindly invite you to this in-site lecture in English language on Tuesday, November 25th, 2025, at 16:00–18:00 h (CET/MEZ).
Topic:
"Towards machine-readable Jawi Newspapers using bespoke AI models"
Speaker:
Miguel Escobar Varela (Associate Professor)
Affiliation:
National University of Singapore
Date/Time:
November 25th, 2025 (Tuesday), 16:00 – 18:00 (CET/MEZ)
Language:
English
Place:
University of Hamburg
Asia-Africa-Institute (AAI)
Edmund-Siemers-Allee 1, Ostflügel ("East Wing"), room O-222
20146 Hamburg
Open to public! – No entrance fee!
About this lecture:
In the century between the 1870s and the 1970s, hundreds of Malay-language periodicals circulated around the Malay-speaking world. These periodicals chronicle a fascinating era and have been the focus of intense study by scholars such as William Roff and Ian Proudfoot. Many of these periodicals have been digitized, and comprehensive collections are at the National Library of Singapore, as well as in other libraries and archives.
Given the availability and size of the collections, the opportunity is ripe for systematic digital analysis. Projects elsewhere in the world have demonstrated the power of analysing historical newspapers at scale using computational methods. Examples include Living With Machines (a partnership between the British Library and several universities in the UK) and Oceanic Exchanges (a partnership between Finland, Germany, Mexico, the Netherlands, the United Kingdom, and the United States).
What is holding back a similar study of Malay-language newspapers?
- The main obstacle is the script. The majority of these periodicals were published in Jawi, an adaptation of the Perso-Arabic script for the Malay language, which poses significant challenges for digital processing. For one thing, typical Optical Character Recognition pipelines (OCR) don't work well for Jawi.
- Another challenge is that most contemporary Malay readers, including many historians who would be interested in these collections, are less familiar with Jawi than with Rumi (the Romanized version of Malay most commonly used today). The automatic transliteration from Jawi to Rumi is also a complex task, as vowels are often not marked down in Jawi.
- In addition to this, spelling conventions have changed, and there are many different approaches for transliterating the same word.
To address these challenges, the Computational Heritage research group at the National University of Singapore has developed specialized AI models for both Jawi OCR and Jawi-to-Rumi transliteration. In this talk, I will describe our progress so far, the challenges we still face and the future directions of our work.
Brief profile:
Miguel Escobar Varela is Associate Professor at the National University of Singapore (NUS), where he serves as Deputy Director of the Centre for Computational Social Science and Humanities (CSSH). In his work, he develops digital methods for studying the cultural heritage of Southeast Asia. His papers, dataset and research software are available at https://miguelescobar.com (link outside the jurisdiction of the University of Hamburg).
You might also be interested in our department's Instagram account.


