Imagine you are a small language service provider (LSP), one of the thousands of translation agencies listed in the Yellow Pages of the world. You are kind-of midlife. Business is tough. You exist because of the words you sell but your word rates are under pressure every year. You’d like to think that you are an entrepreneur: that you are free to make choices. But what choices do you really have? Margins are being squeezed. Machine translation is suddenly the ‘talk of the industry’, and non-professionals – “crowdsourcers” – are willing to compete for your jobs, at least in some industries.
You feel trapped. Your biggest customers are the large international translation companies, even though you may have some good direct clients or accounts. Large Multiple Language Vendors (MLVs) can be the meanest when it comes to rates and payments. You would like to dump them, but you can’t, because they represent the lion’s share of your revenues. Where do you go from here? Anyone want to buy your business….? Not very likely, or it has to be that one very large customer. On a sunny day, you decide to be brave. You take control of your own destiny.
This is what Manuel Herranz did in 2005. He could have gone under. After eight years of hard labor as the European representative for a Japanese language company, he put his fortunes on a management buy-out to turn the company over. The HQ had gone into bankruptcy, and It was time to pack and leave…or turn this misery into something positive.
What many others in his position feared – machine translation – intrigued Manuel. He studied philology in Valencia, specializing in Latin, Greek, German, English and Catalan, but had gone on to Manchester to study mechanical engineering. He was destined for a life as a busy technical translator. And it was enjoyable in the good years as a language consultant for companies like Ford and Rolls Royce. It was never meant to be a struggle to stay alive. Perhaps it was the background in mechanical engineering that opened his eyes to the bright side of the translation industry.
In 2005 he took over B.I. Europa and started working out a plan. He attended conferences and took part in discussions about establishing an industry association for sharing language data. He studied the different approaches to machine translation (MT) technology. He began collaborating with the Polytechnic University of Valencia. And then he made up his mind: not simply to use MT, but to produce MT systems as well. So in 2007 he renamed his company Pangeanic, and the following year Pangeanic became one of the smallest founding members of the TAUS Data Association alongside giants like SDL, Lionbridge, Intel and Oracle.
Translation companies ten or hundred times bigger than Pangeanic were not even dreaming of building their own MT engines, let alone actively considering it. Why would they? Emotionally they’d much rather keep it outside the corporate door for as long as possible. But also they had heard all the horror stories about the failures, the costs and the bad quality. But Manuel didn’t hesitate for a second: controlling your own destiny in translation clearly means mastering the technology.
Together with his partners at the Valencia Polytechnic University Pangeanic began to look closely at the Moses open source statistical MT technology. The scarce translation data they managed to find in the early stages were used to test the various components and understand the bottlenecks. Other challenges came in different shapes: it was clear that a parser was needed to filter out the tags and meet industry needs. Furthermore, plain-text MT output was not always good enough if you need to supply and comply with proprietary formats.
A TMX workflow had to be created to import and export MT translated TMs. A lot of translation data are needed to learn how to optimize the results. In the summer of 2009 the TAUS Data Association (TDA) offered all founding members a period of ‘free pooling’ so they could test the data on their technologies. Pangeanic downloaded a couple of hundred million words and combined it with considerable volumes of translation memories from their customer Sony Europe. Sony Europe was only too happy to share their TMs with Pangeanic and TAUS Data Association (TDA) and have customized engines built.
There are many exciting lessons to be learned from testing and customizing a statistical MT engine, especially for a mechanical engineer. Hundreds of new engines have been tried, tested and delivered since production of MT engines started at Pangeanic in 2009. Not all of them are being used, but the knowledge gathered is invaluable. Failures teach as much as success in MT. Using a Moses engine is not just a matter of feeding it as much data as one can find. Yes, more data helps, but similar or consistent data is also important, as is style and genre. Data preparation and cleaning cannot be underestimated. Knowing how to tune the weighting factors for the language model and translation model training processes is equally critical. Perhaps it helps to play with the N-gram settings for specific languages. But what counts most of all is the data selection and preparation.
Although Moses technology is open source and available to everyone, what really makes the difference is knowledge about how to work with it and fine-tune the best features for each language combination. Perhaps it is still too early to judge whether Manuel has 20-20 foresight or was pushing the risk factor for a small LSP. But we do know MT is in demand. He plans to double his post-editing output from 30% of total business in 2010 to 60% in 2011. This will help him improve his margins on his translation business, while reducing costs and possibly winning new business. At the same time he is developing new business and revenues by building customized MT engines under the new brand PangeaMT.
If you have read this far, you may think that TAUS has had to lower its sights and publish advertorials. But the real reason we are publishing this story is that we all need an industry in which entrepreneurs take charge. We are not recommending that you use PangeaMT, or SDL/Trados/Language Weaver for that matter. We have seen the dangers of an industry that gets locked into a particular technology. Building your own MT technology means walking a path to technology independence from standard TM applications, and it is also a major challenge in an industry where even large corporations and MLVs have little choice.
We would like industry players to take control of their own destiny by embracing or developing the technology of their choice. Not so many doubt anymore that there is a role to play for machine translation. We have witnessed a massive change in mindsets in the last five years. Using or producing MT, that is the question. TAUS takes an active role in guiding and coaching LSPs to become users but also producers of MT. “Let a thousand MT systems bloom” was our slogan at last year’s User Conference. We now offer workshops and reports on how to implement open-source MT solutions and how to evaluate MT. Let us know if you like to get started building your own MT system and we will be happy to help you on your way. If you decide to be ‘just ‘an active user of MT, you may be interested in the TAUS reports and workshop on post-editing.
Resources
- Join one of the upcoming TAUS Round Table world tour meetings
- Related article: Imagine you are a Globalization Director
| < Prev | Next > |
|---|





Comments
RSS feed for comments to this post