TAUS - Enabling better translation

Thursday
Sep 02nd
Text size
  • Increase font size
  • Default font size
  • Decrease font size

Web MT services in troubled times

Proszeky-TihanyiOn February 2nd, Google released another seven languages on its Translate service, including Hungarian. Does this spell bad news for small companies such as MorphoLogic, Hungary's major language technology developer and operator of the Webfordítás web translation service? CEO Gábor Prószéky and László Tihanyi, Director of the Translation Business Unit, explain the strategies available to "independent" MT providers at a time when money is scarce.

Webfordítás came on stream in 2006 with a Hungarian-English pair and now handles 40 languages, including German, French, Russian or even Danish and Ukrainian. How sustainable is your business model in the current climate?

About the half of our revenue comes from adverts, and half from paid translations and similar services. Our language offering is built out of our own pairs (English-Hungarian, Hungarian-English), while the rest are based on licensed solutions from the best online systems available. We were rather surprised by the very positive reaction to what was originally a more scientific than business undertaking. Visitors have grown steadily from about 10,000 a day in 2007 to 65,000 today.

Quality is obviously central to our choice of partner systems, and we have spent a lot of time testing and comparing language pairs from different suppliers. This is in fact a dynamic process: For example, we chose a partner for Romanian but when Google published its Romanian and it seemed better, we had to switch to Google as a supplier.

Now if people tend to start using Google and not Webfordítás for Hungarian to/from English, our ad revenue is very likely to fall. And in an economic downturn, people will not pay out money for something they believe to be only slightly better in quality than the free version. This is a constant risk in today's MT world.

However, the main difference between a site like ours and Google is that we are a "language" site which offers various other features - sentence parsers, dictionaries, soft keyboards etc - that people do not go to Google for. We also deliver better quality Hungarian output...!

You claim that technologically-speaking Webfordítás is an "online transfer" system. How does it fit into the classic rules versus statistics paradigm?

We developed the system over a period of about 20 years in all, working on the basis of language items as patterns, whether in grammar or the lexicon: if a pattern is fairly long but underspecified it would be a rule, and if short and specified it is a lexical item. Every pattern is a source + target pair (a bit like a TM segment). Using this approach, we analyze the source as "words" or "syntactic groups" and at the same time as we parse, we assemble the target.

It's rather like a human interpreter from German: you have to wait for the verb at the end to output the translation but you've done most of the other processing on the way. That's why we call it "online" transfer, rather than a classic 3 phase approach. It works pretty well and we are very proud of this mix of example based, TM and parsing-driven methods. We need a lot of data for this, but not as a direct feed to train the system as in SMT.

What kinds of innovation are you looking at in your MT business?

Our Hungarian/English core language pair offers a number of advantages that we are trying to build on. One is to take English as the natural interlingua, and extend our language pair base by exploiting this.

For example, a partner could take our engine and tools, add their source language (Bulgarian, Latvian, Spanish, etc) to the pair, and then work through English for translation into and out of all the other language pairs. We have already solved a lot of the problems, so our platform could simplify the work for other potential partners who would only need to develop their own language.

This would ultimately generate a genuine ‘matrix' model in which all languages are used for input and output via English, the best-studied language from a computational point of view and clearly the most formally simple when you compare it to a morphologically rich language such as Hungarian. One advantage is that this approach to wide bandwidth MT would not depend on a need for massive amounts of parallel data and the high maintenance overhead that would be encountered by the experimental SMT-driven EuroMatrix project (in which MorphoLogic is a partner).

Another new area we are interested in is dictionary building via the community. We have an Ajax-based technology which allows professional language users to add lexical data and vote on the best translations. By analyzing what they do with dictionaries and how they make choices, we see that people categorize words without realizing it. The results of this kind of contribution can then be integrated into our translation process.

 

JOIN OUR MAILING LIST

Reports

 

Postediting in Practice

 

Implement Open Source MT

 

Increase Your Leveraging

Members

 

CLS Communication, Elisabeth Maier

Chief Technology Officer Dr. Elisabeth Maier is re...

 

McAfee, Paul Walsh

Paul Walsh is the Vice President for Localization ...

 

SDL, Jeremy Harpham

Jeremy Harpham is Senior Product Marketing Manager...