The centerpiece of translation automation, MT as usual served up some interesting new developments at the TAUS User Conference this year in Santa Clara. There were announcements of new products or applications from Microsoft and ABBYY, use cases of MT in real-time web situations, and a growing number of self-service SMT solutions.
There is now strong evidence that the formidable momentum created in the past few years by the Moses open source translation system, as well as by the commercial challenge of independently-developed MT technology from Language Weaver, has reached a watershed.
Going self-service
A number of LSPs and large corporates are now developing complete solutions for a potentially broad range of users. As a result, the underlying engineering platform is being expanded and the training/testing cycle process is being increasingly automated, resulting in self-service solutions.
TAUS (and the industry’s language data repository TAUS Data Association) kick-started this move towards self-service MT back at a TAUS Executive Forum in Copenhagen in May 2010 when it invited three language service providers (Tilde, Pangeanic and Languagelens) to train engines using millions of words from the TDA database and three buyers (Abode, Dell and Intel) to deliver a solution 24 hours later. This was the first public ‘proof of concept’ that self-service MT was possible.
Today, some 18 months later, Applied Language Solutions has built and released a beta of its one-stop SmartMate solution for large buyers and individual translators alike; a consortium of companies including Tilde in the Baltic region have developed Let’s MT for people who want to build their own MT system; and SDL Language Weaver has risen to TAUS’s somewhat tongue-in-cheek challenge of two years ago (Let a thousand MT engines bloom), and announced that it had indeed built 9,800 engines in the last eight months! And Microsoft has demonstrated its Collaborative Translation Framework that provides an API for companies wishing to use Bing Translate for large-scale MT tasks.
Note that all these services depend for their linguistic fuel on the appropriate range of data resources, underlying more than ever the strategic importance of building a simple but powerful big language data in the cloud, as TAUS has been enabling with the TAUS Data Association.
Real-time Interactive MT
As the conference was reminded, live experiments in real time MT go back to Compuserve’s chat room MT service over a decade ago. Intel and Asia Online reported on recent experiments in call centers and related situations that applied MT to multilingual inputs.
Their main findings are that the use of MT can enable support staff working in call centers to increase productivity by handling up to six concurrent callers via text messaging. To train the kind of engines that can support such dialog, they need the right kind of parallel data. In some cases existing translation databases can be repurposed to build such data resources; in others it might be possible to generate new training data by initiating translated chat around some related topic and encouraging the user community to feed corrections back to the system before going live.
In all cases, the watchword is “foster user feedback”, either by using bilingual displays, or a reverse translation checker, or some other channel to ensure that agents can clarify meanings for end-users if the translation is faulty, and thereby improve the engine.
The same goes for speech-to-speech translation in a medical context as Mark Seligman of Spoken Translation showed in his Converser system. In this case, the system supports user input in the form of handwriting, keyboard and even images to aid in disambiguating terms. But as was pointed out, you may need a lot of user feedback to achieve a real improvement to the machine.
MT as a Feature
Revealed first at this conference on October 7, Microsoft is now providing one-click MT on Facebook via Bing Translate for users of a small number of languages who wish to understand other language comments on public pages. This part of a more general “MT featurization” dynamic revealed by Microsoft’s Chris Wendt and Vikram Dendi.
One benefit of placing an MT button on the receiving end of an instant messaging or Twitter-type application is that companies and organizations can quickly find out what the world is thinking about you or your products by leaping the language barrier. In this scenario, MT becomes a feature added to other tools such as sentiment analysis and text analytics.
Another benefit, as in the Facebook example, is to enable other people to share knowledge instantaneously via conversations in such contexts as online gaming, conferences, interactive language learning and avatar management, some of which will soon be using the richer medium of speech.
This effort to deliver MT as a built-in utility, rather than as a separate “destination” where you go to get your language work done, is supported by the Microsoft Collaborative Translation Framework. Vikram Dendi explained how the new Translator API for high volume commercial translation applications is currently being used by companies such as Trip Advisor, Harper Collins, Elsevier, Webster and eBay. It offers website features that deliver instant text translation, language detection, text-to-speech in multiple languages and various types of collaborative translation functionality.
MT as a Research Project
In another conference exclusive, ABBYY unveiled a brand new translation system called Compreno, which took the floor by surprise, but testifies to the health of the MT research community, in this case largely in Russia.
In contrast to the statistical zeitgeist, Compreno is basically a rule-based engine supported by a massive syntactic-semantic language-independent dictionary to power a system that offers a model of “language understanding” rather than language conversion as we know it. It is based on 15 years of development by a team of 200, and the first languages pairs (English into German, French, and Russian) are due in 2013.
Compreno will naturally be judged on its merits within specific contexts of application, but there are two things worth noting here. One is that the MT research community continues to show ongoing interest in modeling languages as knowledge, even if the data-driven approach is in the commercial ascendancy today. The UNL project is an interesting if low-profile example of an attempt to build a language-independent pivot language from which vernaculars are generated via translation. There may well be opportunities in the future for some of this knowledge work to modularize into add-ons to SMT engines, fore example, for specific applications in the general area of Language Intelligence.
The other is that ABBYY has a long track-record of developing optical character recognition and related applications based on a highly data-intensive approach to language, right down to analyzing the fonts and forms of different writing systems. It also has a very large number of dictionary resources. This means it has access to very large (old paper-based) data resources from a broad range of domains, especially literature, on which to build and test Compreno. Open heritage archives such as the Project Gutenberg could well benefit from a translation facility trained on or tuned to such data.
As TAUS members know, data is the rich soil in which translation automation grows. If one endgame for MT is to become a natural and reliable feature on a website or a smartphone, it will ultimately need to have access to all language pairs. As speakers at this conference constantly reminded us, we need to develop better methods to build and access language resources so that that MT button will be able to address the very long tail of human tongues.
Videos of these presentations will be available on YouTube shortly.






