How Google Figured Out Khmer Translation

A screenshot of Google Translate page displaying the translation of "VOA" in both Khmer and English.

A screenshot of Google Translate page displaying the translation of “VOA” in both Khmer and English.

Sophat Soeung, VOA Khmer, May 24, 2013

WASHINGTON — Editor’s note: Around Cambodian New Year last month, Google launched its online translation service for the Khmer language, making it the 66th language to be translatable on its service. Google says the launch is primarily aimed at making a vast amount of non-Khmer content on the Internet more accessible to Khmer speakers. Divon Lan, product manager of Google’s Next Wave Emerging Markets program, recently spoke to VOA Khmer’s Sophat Soeung by phone to explain what it means for the average Cambodian.

Listen to Full Interview: Divon Lan talks about Google Translate Khmer

You didn’t actually use translators to build this system. How did you actually build Khmer translation on Google?

Google Translate is actually machine translation. Basically, the way this works is we look at all the Khmer data that is out there, on the Web and so on. And we figure out automatically the language model. And that allows us to translate not only from English to Khmer but actually from any language to Khmer. Today there are 66 languages on Google Translate, so, for example, you can go to a Chinese website or a French website and get it translated to Khmer and understand what it says and vice versa. Foreigners from many countries can read Khmer text translated into their languages. Now bear in mind that the machine translation is still not at the level of human translation. If you use Google Translate, what we’re aiming for is that you will be able to get the general idea of what a piece of text says. It won’t be a word-to-word translation. That’s the downside.
At VOA Khmer we have two language websites, one in English and one in Khmer. Would that contribute to Google Translation because they basically have the same content? Is that the idea?

That’s the idea. We look at all the content that’s out there th​at is of that nature in two languages and looking at millions and millions of pages, and we try to figure out what the translation should be automatically using algorithms rather than humans.

A Cambodian student uses Google's new Khmer online translation service between Khmer and French. Google Translate released Khmer as its 66th language on its online translation service around Cambodian new year, 2013. (Courtesy of Divon Lan)

A Cambodian student uses Google’s new Khmer online translation service between Khmer and French. Google Translate released Khmer as its 66th language on its online translation service around Cambodian new year, 2013. (Courtesy of Divon Lan)

Khmer is the 66th language on Google Translate. It’s actually the last language of the (Lower) Mekong region, even after Lao. Is that because of the complexity of the Khmer language that it was released later on?

Yes, it’s partly because of the complexity. It’s partly because to create this machine, the “translation language model,” we need a fairly large amount of text available out there on the web. And Khmer is still, you know, the amount of text compared to other languages is still small if you compare it to other regional languages like Thai or Vietnamese. The amount of content in those languages is much bigger. We wanted to make sure that the quality meets our launch level, which is basically that you’d be able to understand more or less what an article is about although that translation is not perfect. The translation quality will improve over time. So the more people use it, the more people suggest corrections, and over time the quality will improve.

Actually the one language we launched just prior to Khmer was Lao, as you’ve mentioned. That’s also one of our most recent launches. And these languages are very similar in many of the difficulties that we face in translating. One of them is the fact that in Khmer and in Lao and in Thai, you don’t use white spaces to have a gap between words. So one of the challenges is just looking at Khmer text and figuring out where the word boundaries are, and the same challenge exists in Lao as well.

When you write in a language like English, you use a space between every word, whereas in Khmer the words are just stuck to each other. While a human reading Khmer can very easily tell the words apart, it’s actually quite difficult for a computer to understand where one word ends and a new word starts. One thing that makes it easier, by the way, is that Khmer has a unique script. So when we see Khmer letters in a document, we know for sure that’s Khmer, compared to Latin alphabets. Like you see a text, you’re not always sure if this is French or Italian or Spanish. It can be anything; they all use the same letters. For Khmer, if it is in Khmer, it’s Khmer. There’s no question. So that part makes it easy. And that’s the same for Lao and Thai.

Divon Lan, product manager of Google’s Next Wave Emerging Markets program, in Phnom Penh, Cambodia. (Courtesy of Divon Lan)

Divon Lan, product manager of Google’s Next Wave Emerging Markets program, in Phnom Penh, Cambodia. (Courtesy of Divon Lan)

How long did it actually take you to build the Khmer translation and what was the most challenging aspect of doing that?

It took us a few good months, maybe even a year. And the challenges were really getting to a good enough level of quality, given the amount of Khmer text out there on the web is still relatively small. An additional challenge is that we found that actually people write Khmer words in many different ways. So there are a lot of people that don’t use a standard dictionary way to write a word. They just write it phonetically, and then we see many variants of different words, which of course adds another interesting technical challenge for us.

Who did you envision as your audience?

Our audience is primarily Cambodian. What we’ve seen in Cambodia is that the young and educated people in Phnom Penh are all using the Internet. But if you think about that, that’s only 5 percent or so of the population of the country. You have 90 to 95 percent of the people that are not using the Internet. Now there are many reasons why people are not using the Internet, like the cost of devices and things like that. But one of the top reasons in Cambodia is Khmer. Most Cambodians speak only Khmer, and I think it’s our duty as a technology industry—Google and other companies—to provide the world’s information to Cambodians in their language. The vast majority of content on the web is not in Khmer. It’s in English or other languages, and we think it’s critically important to give access to that information to all of the world’s information to Khmer-speakers, in their language. That’s the motivation here.

Some in our audience have asked if the Google Khmer translation is available on mobile.

Yes, it is.

A screenshot of Google Maps of Cambodia, displaying in the Khmer language on Friday, May 24, 2013.

A screenshot of Google Maps of Cambodia, displaying in the Khmer language on Friday, May 24, 2013.

Google Khmer was launched more than two weeks ago. What feedback you have gotten so far?

I think the feedback is extremely enthusiastic. I’m following the English-language media in Cambodia, and thanks to Google Translate I now can also follow the Khmer-language media, since I’m not a Khmer-speaker. I’m very excited to see the level of excitement out there. Obviously, there are also comments about the quality, which is to be expected, but I think everybody recognizes that this is really a big step for the Internet in Cambodia.

So this is the early beginning for Khmer Translation. What is the plan? What’s coming up related to this?

This is what we call an “Alpha version,” which means it is the very very first early version of that translation. We hope the quality will improve a lot. We’re investing more and more in Khmer. For example, Google Maps already shows place names in Khmer language. This happened a few months ago. I can’t provide specific details on future plans, but I can say that we’re definitely investing in the Khmer language, because our objective is really to get the world’s information to Cambodians. So every Cambodian—doesn’t matter their background, where they are, whether or not they speak other languages—they should be able to participate in this information revolution that we are in.

On a personal note, I’ll say that my wife is actually Khmer. And when I think about our mission in Cambodia—as Google and even as the technology industry—I always think about my mother-in-law, who is an intelligent, capable woman, but who, like most Cambodians, only speaks Khmer. And because of that, she is not able to access the Internet, access the information that’s out there. And it’s my personal mission to solve that problem. My mother-in-law should be able to use the Internet just like anybody else, in her own language.

Google Now Translates Khmer

As a Cambodian new year gift, Internet search giant Goggle has added Khmer as its 66th language to its translation service – Google Translate.

The scarcity of Khmer-language online content and the complication of the language itself might explain why Khmer was added later than all its neighboring counterparts in the Lower Mekong countries, including Lao.

Right back from Khmer new year, Cambodian tech enthusiasts could enjoy playing around with the new service. Many have expressed national pride for their native tongue having been included by Google.

I did my own try and it is not bad.

Screen Shot 2013-04-18 at 10.51.56 PMInterestingly, this is all done without using a single human translation, according a Google employee. Here’s how it works:
As a result, there are some funny glitches still, like “How are you” would be translated as “អ្នកមានដោយរបៀបណា?” which actually translates as “How did you get rich?”

But according the same employee, the key to improving the service will be user-driven. That is, the more Khmer content we help make available online – alongside its matching English content – the better quality the translation.

Translation has never been more fun!

Historians Look at Former King’s ‘Mixed’ Legacy

Thousands of mourners gather at the gates of the Royal Palace minutes after the coffin of former king Norodom Sihanouk arrived in Phnom Penh October 17, 2012. Tens of thousands poured into Cambodia's capital to witness the procession on Wednesday.

Thousands of mourners gather at the gates of the Royal Palace minutes after the coffin of former king Norodom Sihanouk arrived in Phnom Penh October 17, 2012. Tens of thousands poured into Cambodia’s capital to witness the procession on Wednesday.

Sophat Soeung, VOA Khmer & Victor Beattie, VOA News, October 17, 2012

WASHINGTON DC – Former king Norodom Sihanouk, who died in China on Monday, came to the throne at the age of 19. But he grew to become the most prominent national figure during decades of Cambodia’s turbulent politics.

Those who have closely watched his politics over the years say that the former king’s greatest legacy can be found in the early years of his rule and his continued role as a symbol of unity during troubled times. But they also point to some of his darker legacies, including his support for the Khmer Rouge at a critical turning point in Cambodian history.

Julio Jeldres, Sihanouk’s official biographer, told VOA the former king built modern Cambodia from what had been a feudal monarchy. But his most lasting legacy was the winning of independence for his country from colonial France in 1953.

“He is the symbol of Cambodian independence and unity,” Jeldres told VOA. “He managed to keep this country at peace, while the Vietnam War was raging next door.”

Sihanouk, who was nearly 90, died in Beijing early Monday after a heart attack. He had gradually retreated from public life after passing the throne to his son, Norodom Sihamoni, in 2004.

David Chandler, an author and prominent scholar of modern Cambodian history, said Sihanouk dominated, “or you might even say smothered,” Cambodia’s early political scene.

“He felt himself in some ways to be the embodiment of the country…that the spirit of Cambodia presided in him as king,” Chandler told VOA. “He felt he had a special endowment to represent the Cambodian people. He combined this with a deep and very sincere love of ordinary Cambodians, which is a characteristic you won’t find too frequently among the Cambodian rulers.”

Sihanouk will be ill remembered by some for his support of the Khmer Rouge insurgency, between his ouster by coup in 1970 and the regime’s overthrow of the Lon Nol government, in 1975.

“But he didn’t know what they were going to do when they came to power,” Chandler said. “And when they did come to power, they locked him up for three years, and their cruel and inhumane policies I think shocked him and upset him.”

At Harvard, Room To Write Uncensored

Note: The bove video is in Khmer.

Sophat Soeung, VOA Khmer, May 29, 2012

CAMBRIDGE, MASSACHUSETTS – Although violence against writers and academics in Cambodia has decreased in recent years, many say they still face strong censorship. A literary program at Harvard University, in Massachusetts, provides a place for writers from countries like Cambodia to work uncensored.

Harvard’s Scholars at Risk program put on the “Living Magazine” in April, where written works were performed for an audience, and where writers like Keo Chanbo, who is originally from Battambang, told their stories.

“I had to flee the country for my life, but I can’t stop writing,” Keo Chanbo, who now lives in Minnesota, told the audience of about 100 students, professors and local Cambodian-Americans. She wept as she spoke. “Writing is my life.”

Kho Tararith, who is a fellow with the Scholars at Risk program, recited a poem called “Bopha,” which describes his nostalgia for Cambodia and is, like many of his poems, subtly political.

“In Cambodia, writing not withstanding, Cambodians don’t even dare to openly discuss or read anything critical of powerful people,” he said in an interview. “This might be due to their fears—fears of imprisonment—they fear that what they say might affect politicians and the powerful and cause them trouble. This is why everyone keeps quiet.”

Cambodia’s information environment remains highly restricted. The US-based Freedom House has categorized the country as “not free” in political and civil rights, including literary freedom. Some critics also say censorship has taken hold in the country’s universities, where many topics are unofficially taboo.

At the Royal University of Law and Economics this year, academic leaders banned thesis topics that included land disputes, the Cambodian Red Cross, run by the prime minister’s wife, and the burgeoning stock exchange. Cambodian university officials have defended the practice of banning topics, given a variety of justifications, including the prevention of plagiarism and the repetition of annual thesis topics.

Kho Tararith said such literary and academic censorship goes against the academic environment fostered by the US and other Western countries.

In the US, he said, “we can write about anything. They even give us money to write. They say, ‘All you need is to submit a proposal,’ and they never bar us from any topics.”

Aisha Down, a student of literature at Harvard who spent nine months in Cambodia, said she did not immediately realize Cambodia was a censored society, because there are no obvious censors. However, she said, she came to realize that a culture of fear and violence contributes to self-censorship.

“And because you don’t know where it comes from…there’s no way to really defend against it,” she said.

But the effects of censorship go beyond personal security. They can also affect a society over time, said Steven Pinker, a renowned professor of psychology at Harvard.

“So how was it that in the ‘killing fields’ in Cambodia, the Holocaust, it looked like all of the people were fooled all of the time?” he said. “And one of the reasons is that the people were too intimidated to say what they wanted. You might have very few people actually believing the terrible ideology, but everyone thinks everyone else believes it because no one can say the truth because they’ll immediately get killed.”

Programs like Scholars at Risk can help writers from other countries develop their ideas outside of a repressive environment.

Jane Unrue, who is on the Scholars at Risk committee and organized the Living Magazine, said she hopes the program will inspire the writers of today and tomorrow. It has helped Kho Tararith already, she said before the Living Magazine performance. “I think that people are going to, especially tonight when they hear his poetry, they are going to know that he is an important poet.”

How to Become a Successful Language Learner—A Personal Experience

Author’s note: This article was published in a 2007 edition of the IFL Prospect, the RUPP English Department’s Newsletter (volume 1, issue 3/April-June 2007). I wrote this for an audience of year one and two students. It details my personal experience with learning English. As I have received some very positive feedback on that article, I thought it might be useful for other language learners the general public to get another perspective on learning English. The original blog post is here and has been reproduced on khmerscholar.org.

“How can I best learn English quickly and effectively?” has been the subtlest question from students I have faced in my teaching career. From my observation, however, many Cambodian students do not in fact know how to ‘learn’ a second language effectively. They seem to try to ‘study’ the language more. Learning, to me, is a more subconscious and light-hearted approach to language acquisition, mostly done away from more intense serious ‘study’ we do in class. Indeed, learning a second language—unlike taking a subject like Economics or Biology where serious study is required—can be enjoyable and more relaxing while at the same time effective. What, then, is central to successful language learning? This article will try to unravel some of the secrets behind effective language learners by looking at my own learning experience of English and the success stories of some of my students.

There is no one best way of learning a new language. That is, there are many factors that contribute to success in second language acquisition. However, my long history of language learning—five languages altogether—has taught me that it is more effective to learn a language naturally than intensively or analytically. In the case of learning English, Cambodian students, who face so much interference from their native language—Khmer, have to try to best create an English-speaking environment around them. From my observation, successful language learners are those who (1) know what learning style works best with them and (2) expose themselves to the language as much as possible.

Throughout my history of language learning, like many successful language learners, I have been an audio-visual learner—that is, someone who learns best by hearing things and seeing or reading things. Since this is my preferred way of learning, I have adapted suitable learning strategies and basically learned English by ‘listening to’ and ‘seeing’ English. During my early days of studying English, I believed this style worked best with me. And it did!

Being an audio-visual kind of learner helped improve my English proficiency in all the four language skills. I enjoyed exposing myself to a variety of audio-visual language materials ranging from BBC radio broadcasts, to HBO movies, to the latest English smash hits. This helped me become a good listener and speaker of English. In addition, I was a keen reader, enjoying the pleasures found in novels and story books, and often found myself buried in extensive reading. In other words, I read unspecified texts and articles of various topics of my interest (unassigned topics). Outside class, I frequented the Self-Access Center and the Internet and could make use of the abundant information the two sources provided. To my surprise, extensive reading also contributed to my interest and improvement in writing. I felt that the more I read, the more natural and well-structured my writing became.

In addition to improvements in the four language skills, my approach to acquiring the language meant that I had to be more of a communicative learner than an analytical one. This helped me learn grammar and vocabulary more effectively. From my experience, direct study and analyses of grammatical points and the memorization of vocabulary was somewhat boring and difficult, and proved to be only a minor driving force to success. Instead, it was my active use of the language both in class and outside class that really substantiated my knowledge bank of grammar and vocabulary.

In class, I remember being active in group and whole class discussions, debates, presentations, and other activities. Outside class, I tried to grasp any available opportunity to use the language. These included small talks with friends and native speakers, participations in workshops and conferences, self-study and research, listening to songs, watching movies and documentaries, writing e-mails and journal entries, and any conceivable activity that required me to use the language—English, of course.

Now looking at my own students’ learning process, I can see success stories in the making. In all of my classes, I have observed that top students do share common characteristics. Through interview, I found out that all of them are very active in class, do extensive self-study outside class, have set goals for their language learning, and are not easily discouraged by mistakes they make in their pursuit of improved English proficiency. “I don’t mind if people laugh when I speak English,” said one of them, confidently.

Although successful language learners possess certain identifiable qualities, they may use different strategies to achieving success. In any case, they all know what works best with them. To be a better learning learner, then, it is your task to identify what learning style works best with you to make your learning process more enjoyable. If my history of language learning sounds familiar to you and similar to what you practice, I am reasonably confident that it can earn you an A for most of your works—it really did for my TOEFL! If what you’ve read sounds out of your sphere of learning, and learning English has so far been difficult for you, I suggest you give my way a try!More importantly, if you have other [better] ways of learning a new language quickly and effectively, please do share your experience with all of us. After all, ‘learn’ smart, don’t ‘study’ hard!