Why are more and more companies using Machine Translation software? Certainly because the technology has much improved in recent years. But mainly because the need has become paramount. Fully automatic real-time translation proves most valuable when traditional human translation is not an option:
- As part of a call deflection strategy in global customer support
- As part of a cross-lingual communications strategy on a corporate Intranet
Integration in search, content management and customer relationship management is straightforward. Customized dictionaries dramatically improve the utility of Machine Translation (MT). In these and many other cases, it is a question of MT or no translation at all.
But is there a role for MT to play in an environment where traditionally the work is done by human translators?
"No question about it", says Lou Cremers, Director of the Language Technology Group of Océ Technologies. "Just by integrating MT in our translation workflow and Translation Memory systems we have managed to in-crease our productivity by another thirty percent." Or as a translator at the Euro-pean Commission says: "If I am correcting as much as half the MT output, the other 50% of text is already generated."
As market pressure continues to rise, MT is turning into a blessing for the translation industry. Deadlines and budgets are becoming tighter, so the only way to meet the demands is by automating every possible aspect of the translation process. Every manager's dream is to have "translation out of the wall", as Tim Foster at Sun Microsystems described it in the August 2004 issue of the LISA Newsletter. "In the same way you access your bank account using an ATM, we would like to translate software by simply plugging the source files into a translation system and receiving the results immediately."
But it is not that simple. Sun Microsystems, Océ Technologies, SAP and the European Commission all started their translation automation efforts years ago. While some of the issues they dealt with have now been resolved in commercial applications and standards, there are still stumbling blocks that every company wanting to automate translation in a documentation or localization department has to overcome. In this article we share some of the lessons learned.
Machine Translation quality
When introducing MT into a translation or localization department, the output quality of the system causes much debate. The underlying issue in this debate is that there is no clear definition of translation quality. Quality requirements vary greatly between marketing collateral and technical information. Marketing texts require a certain style and register. The quality of technical information on the other hand is defined by its utility: what is really important is that its terminology is accurate and consistent.
Machine Translation ‘out of the box' will never be good enough, but the specific terminology we feed into the system will be used consistently and accurately. This highly mechanical approach to translation may actually give MT an advantage over human translation when it comes down to translating very large volumes of technical information.
Océ has put tremendous effort in building user dictionaries to support their standard MT engine. User dictionaries for French, Italian, German and Spanish contain on average 9,000 entries. The European Commission has worked on customization of their SYSTRAN engine ever since they started using it in 1976.
Formats, filters and standards
It is frustrating to see how many translation automation initiatives are undermined by something as mundane as incompatible file formats. Everything is ready to go, but we can not process the specific file formats. In today's translation business an inordinate amount of time is spent on manually converting incoming files into a format that can be processed for translation and then back to the format needed for review and publication. That's why the translation technology team at Sun assumed standardization in file formats as a priority in their translation automation plans. Instead of trying to force everyone to use one file format, Sun adopted the XLIFF (XML based Localization Interchange File Format) and TMX (Translation Memory eXchange) interchange standards. Every incoming file is converted automatically into the interchange standard for translation processing. Océ is also migrating its translation process to an XML-based format to make it more open to the world. The fact is, however, that the world is still producing most of its information in proprietary formats. According to a survey we have made in the translation industry 47.8% of the documents processed are in Microsoft Office formats, 22.3% in HTML and only 6.6% in XML and SGML. As long as proprietary and free-style formats can enter the system, it will be impossible to switch off the lights and trust that the automated translation system can roll through the night.
Workflow
Introducing MT to an Intranet or Knowledge Base system is relatively simple compared to integrating it into an already existing translation process. The MT system is the single supplier of translations in an Intranet or Knowledge Base implementation. But in a traditional translation process, the supply chain contains many different actors and systems. Translation will be retrieved from Translation Memory (full matches and fuzzy matches) and Machine Translation (fuzzy matches), sent to translators for post-editing, next to reviewers for validation. Manually managing the logistics of this supply chain, including file format conversions, can cost companies as much as the actual translation itself. Some companies and institutions like Sun Microsystems, Océ and the DG for Translation at the European Commission have automated steps in this process with custom-built modules. The alternative today is to buy a commercial application that is designed for translation process automation. But again, the ‘out of the box' workflow system will not be good enough. The system will have to be tuned to the customer-specific process. The traditional dilemma looms up once again in most of these implementations: will the system be adapted to the operators or will the operators have to adapt to the system. In other words: do you need a collaborative or a deterministic workflow system?
There are tremendous opportunities for making savings in translation process automation. But the difficulties in actually achieving these savings should not be underestimated. One corporate user told me that they earned their investment back simply by automating file conversion, but they still struggled to automate the logistics.
Translation Editor and Post-editing
When introducing MT in an existing translation process it is essential that the results from Machine Translation are fully integrated into the Translation Editor. Translators should be able to review and edit fuzzy matches from Translation Memory in the same way and via the same interface as they post-edit the results from MT. The results from MT can be presented as fuzzy matches in a different colour indicating that they originate from the MT engine. Post-editing MT is a new skill that is only now starting to develop. In our experience, post-editing MT costs on average 50% of the time it takes to do full human translation, assuming that the MT system is customized for the domain in which the translator is working. Some sentences come out fully correct, others contain a verb conjugation error or a word sequence mistake. Experienced post-editors will develop a routine to identify and correct these mistakes quickly. Recurring terminology mistakes can be reported back to the colleague who is looking after dictionary maintenance. This seamless integration in one Translation Editor of MT and TM ensures that all post-edited and validated MT segments go straight into the TM database. In a sense, the system is ‘learning' from this process, and over time the customized MT engine gets better and better due to the continuous input of new terms.


