An AI-powered Baybayin Translator is being Developed by UP Mathematicians6 min read

July 11, 2023 - By NeP-C Ledesma

Share this article:

Here’s an exciting news for language, communications, and history enthusiasts out there. For those people who have been wanting to learn Baybayin like me but don’t have enough time, I guess a smart, AI-powered translator can help us bit by bit.

A group of Filipino mathematicians has recently achieved a significant breakthrough by inventing a computerized method to convert complete paragraphs and even documents written in the ancient Filipino Baybayin writing system into easily comprehensible text, accessible to non-native readers. Currently, they are diligently developing a comprehensive two-way translator for Baybayin, aiming to facilitate seamless communication and understanding between the ancient script and modern languages.

The University of the Philippines – Diliman College of Science Institute of Mathematics (UPD-CS IM) has achieved a remarkable feat by merging mathematics and technology. Their groundbreaking work has resulted in the creation of what is possibly the world’s first optical character recognition (OCR) system capable of accurately differentiating between complete sections of Baybayin and Latin characters within a text image. This innovative development showcases the potential of mathematics and technology in advancing the understanding and preservation of ancient writing systems.

Thousands of images, months of hard work

In their research paper titled “Block-level Optical Character Recognition System for Automatic Transliterations of Baybayin Texts Using Support Vector Machine,” masters student Rodney Pino, along with associate professors Dr. Renier Mendoza and Dr. Rachelle Sambayan, devised an algorithm to convert photographs of text into binary data. This data is then processed through a support vector machine (SVM) character classifier, which automatically identifies whether the characters belong to the Baybayin or Latin script. Their innovative approach allows for efficient and accurate transliterations of Baybayin texts.

“SVM is a machine learning algorithm used to solve regression or classification problems,” Pino explained. “We have a dataset for Baybayin characters—let’s say character A and then character BA. SVM uses techniques or mathematical methods that can separate the two datasets to determine characters BA and A.”

It took the group more than three months to collect over a thousand images for each Baybayin character, gathering a total of 110 paragraphs from different websites that have either hand- or typewritten Baybayin, Latin, or Baybayin and Latin writing. “Adding more character images improves the recognition rate of SVM,” Pino explained.

Developing a smart, two-way translator

At present, the developed OCR system has the capability to generate the Latin equivalents of Baybayin characters, resulting in a transliterated version of the text. However, the researchers are striving to expand its functionalities beyond this. They aim to enhance the system to perform additional tasks and provide more comprehensive features.

In addition to its current capabilities, the mathematicians have future plans to enhance the OCR system’s understanding of the context of Baybayin words and phrases. This development could potentially lead to the creation of a complete translator. Furthermore, they are actively working on enabling the system to perform a two-way conversion, allowing the conversion of Latin words with foreign sounds into Baybayin. These advancements broaden the system’s functionality and improve its versatility in interpreting and translating text.

“We’re trying to refine the software we developed to make it easier for future users to navigate it. We also dream of creating a mobile application that automatically and accurately translates Baybayin characters just by hovering over the phone,” Dr. Mendoza said.

However, there are some kinks to smoothen out: Dr. Mendoza said that it was challenging to get the OCR system to translate Baybayin words and sentences accurately. “For now the system can’t distinguish between some Baybayin characters that are similar in writing, such as E and I, and O and U. We also have a lot of words that have different Latin equivalents,” he expounded. “The algorithm we used shows all possible translations of the Baybayin words.”

Preserving Filipino writing systems

Although still scant, interest in and research on Baybayin is slowly increasing, making the mathematicians hopeful that more Filipinos will become interested in protecting Baybayin through research. The team published their data to encourage more researchers to conduct studies on Baybayin and OCR. “We cleaned the data in such a way that researchers could use it in analyzing Baybayin through other algorithms,” Dr. Mendoza shared. “We made the data readily available for use, so researchers wouldn’t go through the difficulty we experienced in gathering data.”

The “Philippine Indigenous and Traditional Writing Systems Act” is a legislative initiative introduced by the government officials of the Philippines. Its primary objective is to promote, protect, and preserve traditional writing systems like Baybayin, which hold significant cultural and national importance. This act acknowledges the value of these indigenous writing systems as representations of Filipino tradition and identity. By enacting this law, the government aims to ensure the continued recognition and appreciation of these writing systems, safeguarding them for future generations.

The proposed law emphasizes the use of Baybayin as a means to foster cultural development and preservation. It urges organizations and institutions to take the lead in conducting activities and projects that raise awareness and promote the significance of traditional writing systems. By encouraging the integration of Baybayin in various cultural initiatives, the aim is to ensure its continued relevance and appreciation among Filipinos and instill a sense of pride and understanding of their cultural heritage.

According to the scientists, Baybayin is living proof that we Filipinos have our own technically-sophisticated traditions. While they aren’t putting forward making Baybayin the Philippines’ primary writing system, the group believes that conducting more research on Baybayin will help preserve this heritage. “This can be forgotten,” Dr. Sambayan said. “It’s important to have a record of each Baybayin character—even having digitized ones.”

Dr. Sambayan expressed concern that the number of Filipinos who can read and write Baybayin is decreasing, adding to the importance of identifying and translating Baybayin characters into Latin. “We’re hoping that through this OCR system, we could preserve and pass on the knowledge of understanding Baybayin to future Filipino generations,” she said.

Baybayin and other traditional writing systems are a part of the Philippines’ rich history. Several old Filipino documents are in Baybayin that can uncover more information about Filipino culture. The scientists are encouraging more Filipinos to join them in cultivating the body of knowledge the country has on Baybayin. “Kapag walang gagawa nito, sinong gagawa? Even though its implication already has a bit of a niche, I think this is still a vital research venture,” Dr. Mendoza said.

Thoughts

I’m so happy that these UPD mathematicians are working hard to preserve Baybayin, especially for future generations. Yes, interested learners are not exactly studying it like how we study languages, but the Baybayin AI-translator is a first step for us to understand it, and eventually be better at it.

Original Article by: Eunice Jean Patron, UPD-CS SciComm

Sources:

Pino, R., Mendoza, R., & Sambayan, R. (2022). Block-Level Optical Character Recognition System for Automatic Transliteration of Baybayin Texts using Support Vector Machine. Philippine Journal of Science, 151(1), 303-315.

Philippine Indigenous and Traditional Writing Systems Act, S. 1680, 19th Cong. (2022).