During 2008, Microsoft went live with its own general purpose machine translation system with a dozen language pairs. This solution is gradually being integrated into the Office environment, and may revolutionize desktop access to multilingual information for the company's huge customer base around the world. This technology rollout marks the culmination of two decades of R&D into natural language processing and MT which has rarely had much public exposure. TAUS has therefore completed a report on Microsoft's machine translation activity and it will be published for TAUS members in early 2009.
The Microsoft MT project came together when a group of researchers quit IBM's NLP research and joined the recently established MS R&D Group (headed up by Chris Wendt) , focusing on fundamental activities such as developing parsers and lexicons for a number of languages. This work resulted in such products as MS Word spell- and grammar-checkers, but also laid the foundation for MT research starting in 1999. After experimenting with example based system, the R&D team eventually graduated to hybrid statistical engines ( known as a Statistically Aided Machine Translation - SAMT) that today integrate a variety of tools and processes, When this underlying technology moved from development to production 5 years ago, it was first applied to knowledge base content for self-service Customer Service Support, and then in the following years to the documentation localization process, using LSPs as post-editing service providers. At each stage, the Microsoft approach has been highly pragmatic. Language pairs are almost continuously tested and the engines are retrained to ensure that it is always tuned to the most relevant content for the task at hand.
One crucial work track is focused on adding a syntactic "rules" based component to the statistical engine, using the multi-language parsing work from the R&D team, mainly to improve the MT output quality. This component can also undertake complex morphological processing for languages where morphology is a critical factor in carrying meaning. Today, Microsoft supports six language pairs working in both directions and 19 in one direction. It provides raw MT output for customer service, MT + TM output to drive the localization process, and the online Live Translator service (12 two-way language pairs), now exclusively powered by SAMT. All this adds up to a large multi-year budget for the R&D, and huge server resources today.
The TAUS report (due to be published in January 2009) looks closely at the work that goes into Microsoft's large-scale SMT deployment. We give full details on the company's quality evaluation techniques, both in training and production, and also look at how the company uses community techniques to drive translation beyond budget borders (through wikis) and monitors consumer usage and feedback to improve quality.
| |




