What you don't (want to) know
The first responses to the Google Translation Toolkit are generally positive. No wonder: it makes total sense. It offers a friendly interface for translators to edit translations, do peer review, leverage from previous translations and make use of an ever better MT engine that is fully integrated. You can even upload your own TMs and glossaries and have them hosted by Google. And, most importantly, all of this costs you nothing.
While Google Translate is being used by millions of internet surfers, the Google Translation Toolkit is clearly targeted at a smaller audience of a couple of hundred thousand professional and semi-professional translators. The timing of this new offering could not be better. Translators are under unprecedented pressure to translate faster and cheaper. Translation buyers are desperately looking for a break from lock-in service models. The buzz words at every conference and in every blog are ‘community', ‘openness' and ‘sharing'. The Google Translation Toolkit seems to fit perfectly into Zeitgeist.
The world is upside down. The best things come for free. Just as Google Translate picked up millions of users - including many professional translators - this new service from Google is likely to gain popularity, even among translators working for corporate and institutional customers. An industry under stress can easily be trapped by such an anomaly.
"Méfiez-vous", as the French say. There is a hidden catch in the Google Translation Toolkit offering that many of us may not be aware of, or would prefer not to think about. Google Translate thrives on data. The more data, the better the automatic translation. Data in this context means translation memories and glossaries. Translators using the Translation Toolkit ‘share' their translations with Google. If 100,000 translators start using the service, Google will be harvesting 50 billion words of good quality translation data per year to help Google improve their automatic translation engines. In addition translators may be uploading their own (or their customers') TMs. As long as Google keeps offering Google Translate and the Translation Toolkit as a free service to everyone, many people in the industry may still see this as a benevolent scenario. However, there is no guarantee that Google will continue not being evil and offering these services for free. Google is a commercial enterprise and may decide that translation is a good source for additional revenue in the future. The meter could be switched on and translators will find themselves hijacked and punished for their naivety.
The new open global society with so many free and commercial offerings carries responsibilities for citizens and professionals to decide what is good, both for themselves and for others, now and over the long term. The ingredients of the Translation Toolkit form a recipe for a breakthrough in translation efficiency. The concept of data sharing on an industry-wide basis is the key to dearly needed, acceptable quality automatic translation. However, the problem with the Google offering is that ‘sharing' and ‘openness' are illusions. If some of the largest translation companies in the world - Lionbridge or SDL - or any other commercial enterprise were to offer an open TM sharing platform, most people would think twice before using it for exactly the same reason: lack of trust that this is a genuine open and sharing platform.
Many other global IT companies, government institutions and translation service and technology providers have realized that a new unique approach is needed to boost translation efficiency and respond to a rapidly growing demand for translation. In July 2008 they established the TAUS Data Association as a non-profit member-driven organization. In May of this year TDA launched its TM Sharing and Data Pooling services for members as well as a public free Language Search. The language data in the TDA platform are categorized by industry domains, allowing members to optimize translation quality and efficiency within their own vertical markets. One month after the public launch the TDA platform already contains 550 million words in 80 language pairs. Translation memories come from trusted sources and are legitimately shared. All TDA members can equally benefit from TDA, now and over the long term.
See also the article ‘The unreasonable effectiveness of data'.


