TAUS - Enabling better automated translation

Friday
Jul 30th
Text size
  • Increase font size
  • Default font size
  • Decrease font size

Incredible Progress in MT

Interview with Alain Désilets

Alain DésiletsAlain Désilets is a Research Officer at the Institute for Information Technology of the National Research Council of Canada (NRC). Part of his job is to think about the future of translation technology

"Two technologies which will drastically change the way we translate content: massive online collaboration a la Wikipedia, and Machine Translation. Shared language data repositories are central to both the collaborative and MT innovations. A year ago, I would have said that MT was still too imperfect to impact the translation industry in any significant way. But recently, progress has been incredibly rapid, even more rapid than its most optimistic proponents ever dreamt of."

About Alain Désilets

The National Research Council (NRC) is the Government of Canada's premier organisation for research and development. It has been active since 1916 and currently employs 4,000 people across Canada, doing R&D in a wide range of fields relevant to the competetiveness of Canadian industry, and the public good. Some of the many innovations by NRC personnel included the artificial pacemaker, development of canola (grapeseed) and key frame computer animation.

Alain Désilets is a Research Officer at the Institute for Information Technology of the National Research Council of Canada (NRC), and an active member of the Language Technology Research Center, a joint initiative between the NRC, Université du Québec en Outaouais, and the Translation Bureau of Canada. For more than a decade, he has been doing research on applications of human language technologies (speech recognition, machine translation, bilingual text alignment and text mining), always with a strong emphasis on meeting genuine needs of end users. He is also very active in the area of collaborative wiki tools, and was general chair for the international WikiSym 2007 conference. He is co-founder of LGPLT, a multidisciplinary group of 8 researchers from NRC and Université du Québec en Outaouais that aims at better understanding the technological needs of professional translators by observing and interviewing them in action in their normal workplace. His recent work has focused on computer-assisted translation technology, with an emphasis on tools to help translators collaborate and share knowledge withing world-wide communities of practice. Last November, he gave the opening keynote at the Aslib Translating and the Computer conference, and spoke about the impact that massive online collaboration will have on the world of translation.

What do you see as the greatest innovations in translation?

I can see two technologies which will drastically change the way we translate content: massive online collaboration a la Wikipedia, and Machine Translation.

Massive online collaboration will change the game on several fronts. On the one hand, it will allow the collaborative creation of large translation resources. For example, individuals and organisations will be able to collaborate on the creation of very large, wikipedia-like terminology databases to cover a wide range of domains and languages. It will also allow them to share corpuses to create very large and versatile translation memories.

On the other hand, massive online collaboration will also enable collaborative, volunteer-based translation of worthy public content. For example, school books and health manuals for third world countries. In a more commercial context, collaborative translation will allow companies to crowd source some of their non-core translation work to communities of volunteers. This is what Facebook has done for example, to localize its web site to the German language. We might also see the emergence of hybrid volunteer-paid models. For example Larousse, the leading publisher of French dictionaries and encyclopaedia, just released an online encyclopaedia which accepts contributions from volunteers, and the company has promised to financially compensate the most "worthy" contributors. Even in completely conventional translation shops, "agile" collaborative processes may also be used internally to achieve productivity gains.

It's hard to predict how these sorts of collaborative paradigms will actually play out in the market place. If there is one thing that the Wikipedia experience teaches us, it is that it's very difficult to predict what will and will not work when dealing with collaboration at that kind of scale. One possible scenario is that massive collaboration will lead to a "rebirth" of the freelance translator, because it will allow them to operate within the context of large communities of practice, and thus achieve the kinds of economies of scale that only larger translation offices were able to achieve up to now. It may also allow freelance translators to form temporary, "virtual" alliances with other freelancers, in order to bid on specific large contracts. But this is all speculation at the moment.

Machine Translation is another game changing technology that's right on the horizon. A year ago, I would have said that it was still too imperfect to impact the translation industry in any significant way. But recently, progress in MT has been incredibly rapid, even more rapid than it's most optimistic proponents every dreamt of. In particular, hybrid rule-based and statistical approaches such as the one developed here at NRC, are getting pretty close to human quality translation, even in large, unconstrained domains. It's gotten to a point where using MT may be sensible in a number of situations beyond gisting of content (which up to now was the main poster boy for MT). For example, it seems that for the first time in history, post editing of MT outputs by professional translators might be clearly faster than translation from scratch, without sacrificing quality. It will be interesting to see if and how this approach will be adopted in industry in the coming years, and what actual impact it will have on productivity.

In a sense, both collaborative and Machine Translation technologies are changing the relationships between professional translators, amateur translators and machine. They are indicative of a shift of the profession away from "authoring" translations and towards "revising" draft translations produced either by a machine or an amateur volunteer translator. Here at NRC, we are actively researching those technologies, how they relate to each other, and how they actually fly in authentic production environments.

Industry-shared language data repositories

TAUS is establishing an industry-platform for sharing translation memories. How does this fit in your vision of the future translation industry? How can this stimulate innovation and improvement of translation services?

I think shared language data repositories are central to both the collaborative and MT innovations. Collaboration technology is the engine that will make it possible to share such data and leverage it efficiently. Using this shared data, it will in turn be possible to generate very large sets of training data which are key to improving MT systems.

One thing I would like to see is for much of this data to be public and freely accesible to anyone who thinks they have a use for it. I know I am being idealistic here, but it seems to me that in this day and age, seamless communication across languages is vital and it is something that we would collectively strive for. And when it comes down to it, most organisations don't have much to lose by sharing translation archives with the world (or at least, the part of it that is non-confidential), because translation is not their core business.

I think organisations like TAUS can play a pivotal role in making this happen, by providing a venue and process that allows individuals and organisations to give to the community, without fearing that their gift will be unduly hijacked by some other party. In other words, TAUS could play the same role with respect to shared linguistic data as the Free Software Foundation plays with respect to Open Source software. This would be a great contribution to the world. Don't get me wrong here. I don't object to commercial use of shared linguistic data. I am just saying that use of this data (whether commercial or not) should not be the privilege of a select "club".

Translation capacity

The world counts a couple of hundred thousand professional translators. That is far from enough to translate the massive and ever growing volume of content. How do you see that amateur or volunteer translators can be modilized and made part of the process?

In the coming years, I think collaborative translation will contribute to decreasing this gap between supply and demand. Another thing that can be learned from Wikipedia is that if something is worth writing about or translating, there is probably someone out there who cares enough to do it for free. There are a number of examples of organisations like Facebook that have leveraged communities of volunteers to crowd source non-core translation work. Collaborative translation may also make it possible to translate important content into minority languages, by giving those minority communities the tools they need to "help themselves".

An interesting question in this context is the role that professional translators can play in a volunteer based collaborative translation effort. Collaborative communities like Wikipedia are generally "do-acracies", meaning that actions and contributions count more than diplomas and official status. This can make it difficult for professional translators to interact constructively with non-professional volunteers. It's hard, but not impossible. Colleagues of mine who tried it tell me that it requires a lot of humility and patience on the part of the professional, and a shift from acting as a "guardian of quality" to acting as an educator and facilitator on quality issues.

So yes, collaborative translation will play an important role in reducing the gap, but at the same time, I doubt that it will scale enough to fill the bottomless void between supply and demand. I can see how it may enable the translation of important documents whose life expectancy is reasonably long. But I can't see how it will scale up to include for example, translation of every discussion forum posting that's out there on the web. For that kind of high volume, volatile content, the only hope is fully automated Machine Translation. Whether MT will ever be good enough to do that remains to be seen, but I can't think of an alternative. Certainly, recent progress in MT makes me more optimistic about this possibility than I was a few years ago.

 

 

JOIN OUR MAILING LIST

Reports

 

Postediting in Practice

 

Implement Open Source MT

 

Increase Your Leveraging

Members

 

Pangeanic, Manuel Herranz

A mechanical engineer at a quality assurance depar...

 

McAfee, Paul Walsh

Paul Walsh is the Vice President for Localization ...

 

SDL, Jeremy Harpham

Jeremy Harpham is Senior Product Marketing Manager...