Khmer language spelling is a controversial issue in Cambodia. It is not unusual to see people argue over how Khmer words are spelt. Wrongly spelt words can be seen in local media, books, billboards and even government announcements. To fix this problem, Danh Hong, a typographer and the programmer behind the standardised Khmer Unicode font, has worked with the Khmer Writers’ Association to create the first-ever Khmer spellchecking website. In an exclusive interview with Khmer Times, Mr Hong discusses his important mission.
KT: Based on your observations, what do you think about written Khmer on digital platforms today?
Mr Hong: People are making so many mistakes, but most of them are simply unintentional. We had officially declared that our writing shall be based on Chuon Nath’s Khmer Dictionary, but our people do not have the habit of looking at a dictionary while writing or they do not have one. During the 90s, the Khmer Dictionary was republished but it was expensive. Now, although a copy is cheaper, we no longer have the habit of looking up into it and we simply write the same way we speak. However, sometimes people just mistype the words.
KT: Why did you create this website instead of waiting for the government to do it?
Mr Hong: Although I was born in Kampuchea Krom, I am also a Khmer. Having been working with computers for years, I noticed that other languages have their own spell checking software to correct the spelling or grammar mistakes in typing. Mistakes, no matter how small they are, should not be ignored. However, it is very difficult to create a spell checker for Khmer language. Why? First of all, there is no space between words in writing. The software was created by foreign programmers therefore [it] does not work on such feature. Some developers have tried creating ones for the Khmer language, using the template made by foreigners, but they still need the user to type “zero-space” between words to operate.
What I have created, on the other hand, is based mainly on the original no-space writing and it can still recognise the mistakes without any zero-space. As a professional technician, I believe we need to make things more convenient for the people.
KT: Have you worked on any other projects that also contribute to Khmer writing?
Mr Hong: I have worked with several other computer technicians on the first ever Optical Character Recognition Engine for the Khmer Language, also known as Khmer OCR, which allows the user to scan the paper-based text in Khmer into digital text, which we can edit on a PC. In fact, Khmer Spelling Checker was a part of Khmer OCR. However, without sponsors, the others had stop working on this project for a while. I was still working on it alone when I met with the Khmer Writers’ Association, who was interested in my projects. This led to the creation of the Khmer Language spelling checker software and website, which we have already launched.
KT: Can you tell us how the Khmer spelling checker works?
Mr Hong: Instead of relying on an installed, fixed dictionary software, our website’s operation is based on Khmer-language website publications to create a cyber-dictionary. And then we compare the words to those in an actual dictionary. We delete the words that are spelt wrong. That is the stage in which we need human assistance. At the same time, the cyber-dictionary has also collected piles of new words that do not exist in the old dictionaries. The longer it is used, the more legit it will become. But, it also depends on the users.
KT: Now that your software and website have been launched, how do you think Khmer language writing could be improved?
Mr Hong: The main priority should be local journalists and the Ministry of Education because they are influencing how people write in the Khmer language. People in general read what journalists write and write the text the same way because they think their writing is right. Meanwhile at schools, students usually write just like their teachers do, although the teachers make mistakes. Both groups have to correct their writing. Our group also needs their feedback so that we can improve the effectiveness of our software and website. We would like them to report any wrong words on our machines. We are also thinking about adding the prediction function.