海角大神

Language Weaver: fast in translation

How one firm quickly translates reams of data.

|
John Kehe/Staff

If you want to text message your Spanish-speaking neighbor, but don鈥檛 know how to say 鈥淧lease turn down the radio鈥 in that language, you could find a quick translation online at any number of websites. But, if you are, say, a large semiconductor company with customers around the globe, you are in a pickle if all your support data is written only in English.

Enter Language Weaver, a Los Angeles-based firm on the cutting edge of a rapidly growing field known as machine translation (MT). The firm took one chipmaker鈥檚 extensive database and translated it overnight into Spanish, the No. 1 tongue in demand by that company鈥檚 customers. This task, says the company鈥檚 CEO Mark Tapling, would have taken weeks to accomplish not too long ago. Instead, its software made short work of a gargantuan task.

The $100 million MT industry has the potential to grow by more than 50 times that number, some analysts estimate. 鈥淟anguage Weaver is a leader in this field,鈥 says Don DePalma, chief research officer with Common Sense Advisory Inc., who specializes in the somewhat arcane world of computerized translation services.

This may seem like a yawn-producing competition among geeks, one that transpires beyond the purview of most people鈥檚 concerns. But in fact, say industry watchers, making swift, high-volume, global communication possible is quickly moving up the to-do list of those who conduct international business deals. For instance, what happens to a nuclear power firm doing business in remote parts of India with no ability to hand over documents in the proper local dialect?

鈥淭he ability to translate lots of information quickly is becoming one of the important concerns of a global economy,鈥 says Mark Przybocki, computer technologist and MT team coordinator with the National Institute of Standards and Technology, in Gaithersburg, Md. 鈥淓specially when you consider the huge amounts of information accumulating on the Internet.... Effective machine translation is becoming more important every day.鈥

Just what constitutes 鈥渆ffective鈥 MT is a source of lively debate among a small but growing number of linguists, mathematicians, and computer specialists who dominate the field. Since the 1980s, the MT field has consisted of three approaches: rules-based, in which programmers entered up to 20,000 grammatical rules to direct the translation; example-based, in which discrete examples serve as guides; and statistical, in which 鈥渟mart鈥 computer algorithms 鈥渓earn鈥 from previous translations and develop their own guidelines.

The first two approaches were dominant until the turn of the century because the statistical method required so much data from which to 鈥渓earn,鈥 as well as massive amounts of processing power to search and cull its protocols, and enough memory to retain the information. But the statistical approach became more viable as computing power began to accelerate and memory capacity grew more affordable.

Language Weaver grew out of what Kevin Knight, one of the company鈥檚 cofounders, calls a 鈥渨atershed workshop鈥 in 1999. His team discovered that the translation protocols developed for one language could move seamlessly to another without having to start over from scratch with each new tongue. The group鈥檚 work enabled it to nab all-important research funds, and within two years, the commercial venture began. Today, Mr. Knight sits in front of his computer looking at a translation program for Chinese that is capable of processing some 100 million directives.

But this would not be cutting-edge technology, however, without some disputes. Chief technology officer and cofounder Daniel Marcu has T-shirts to prove it. One reads, 鈥淚 lost the syntax bet,鈥 another says, 鈥淚 won鈥;聽 he alternates them depending on how the arguments go. This refers to a wager between his team and a former colleague who now runs the free translation service at Google. Mr. Marcu has maintained that the system will still need grammatical rules no matter how much a statistical system is able to learn from previous translations, while the other side believes that statistics alone will provide all the necessary guidance.

Friendly wagers aside, Marcu says that in the end, it won鈥檛 matter. 鈥淭here is so much information on the Internet ... that these systems will absorb grammatical rules without pausing to articulate them.鈥

The biggest challenge MT may face is human expectation. 鈥淧eople think machines should be able to act like the computer on the bridge of the Star Trek鈥檚 Enterprise, or C3PO. That would be nice,鈥 says Mr. DePalma, 鈥渂ut while everyone would like that fabled Babel fish in the ear [the universal translator from the sci-fi classic, 鈥淭he Hitchhiker鈥檚 Guide to the Galaxy鈥漖, we are still a ways off from that.鈥

You've read  of  free articles. Subscribe to continue.
QR Code to Language Weaver: fast in translation
Read this article in
/Technology/Tech-Culture/2008/1001/language-weaver-fast-in-translation
QR Code to Subscription page
Start your subscription today
/subscribe