In this report for TAUS members we describe a series of English->Chinese and Chinese->English machine translation trials conducted by TAUS Labs on computer software documentation.
Statistical MT engines were trained and tested using Amazon Elastic Cloud on corpora from the TAUS Data repository.
We built and tested 16 MT engines for each direction, using varying parameters. This included a number of different strategies for data selection/preparation.
Following a break of five years TAUS returned to China on April 23th-25th, 2012. In cooperation with CCID and the Translators Association of China (TAC), TAUS organized the Asia Translation Summit in Beijing on April 24th and 25th. On the pre-conference day, a Moses Open-Source MT Workshop was offered to all interested parties.
TAUS received a very warm welcome in Japan from a growing community of users and researchers of automatic translation attending the Executive Forum on April 19th and 20th, 2012. Mr. Chigira, director of localization of Oracle Japan, opened the meeting by speaking about the tsunami that hit Japan in 2011 and its impact on global communications.
The Chinese market is known for low labor costs, rapid growth and serviced by thousands of translation agencies. We were unsure about the resonance of our message that using Moses helps ensure flexibility and choice for users and fosters a healthy competitive landscape. To ensure awareness that the open source toolkit, Moses, helps to improve translation processes and capacity and create new business opportunities.
This complimentary report is relevant for anybody interested in customized machine translation, especially open source solutions. It summarizes the results of survey research undertaken by TAUS as part of the EC funded MosesCore project in Q1 2012 and compares it with the results of the similar Q1 2011 survey. The survey asked respondents to inform of their progress and experiences with Moses, and in particular to highlight the challenges they have encountered, as well as to suggest solutions.
The report also outlines our recommendations on findings. These serve as a foundation for future Moses development activity and collaborative industry initiatives.
One of the most exciting applications of machine translation is for real-time chat. And while many in the language services industry will be harnessing MT in the coming years to optimize their operations, a few innovators are already attempting to deliver this very challenging service. As these solutions filter through to mass use it will be fascinating, and perhaps a little frightening at times, to observe the social and commercial impact.
Four use cases were presented at last year’s TAUS User Conference. These came from Intel, Microsoft, Asia Online and Spoken Translation. Each demonstrated application scenarios where utility is valued over eloquence as a measure of translation quality.
Sustainable Growth Series

During last year’s TAUS/ProZ.com Great Translation Debate 75% of participants voted against the motion that higher education courses prepare translators sufficiently for life in the industry following a panel debate. During the discussion the only academic on the panel, Catherine Way, a senior lecturer at the University of Granada, Spain, gave a clear overview of the good work being done at her school.
We invited Catherine to explore the subject of translator education further. Here’s her response to the question:
What should be included in university courses to prepare translators for life in industry?
The first of twelve TAUS Open Source Machine Translation Showcases planned for the next three years took place over the weekend in Monaco just ahead of GALA’s annual conference. These free sessions are funded by the EC as part of the MosesCore project.
Here’s a short summary of the session with links to all presentations.
I opened with a short overview of the landscape as we see it:
Sustainable Growth Series
Google, Microsoft, Baidu, Yandex and Yahoo! are either getting paid or getting ready to be paid for translation through advertising revenue. Giving access to multilingual information increases their user base and raises cash. Lots of it. They are setting an example that many others would like to emulate.
Anyone who fails to see the fundamental shift in the demand for translation from the traditional buyer to the billions of citizens, patients, tax payers and consumers, is just scratching the surface of the vast potential for the global language industries.
The European Commission tell us that each EU citizen is paying on average €2 per year to fund the one Billion Euro translation budget of the Directorate General of Translation, by far the largest in the world.
We can rightfully say that translation is already being paid for in different ways than the word-price model.
Moses: Commodity Creates Opportunity
We are witnessing the language industry’s commoditization of one of the most significant and far-reaching innovations in translation technology - phrase-based statistical machine translation. The very same technology used by Google Translate and Bing Translator. The ready availability of the Moses MT engine under an open source license enables everybody to create statistical MT engines from parallel data with a moderate amount of effort.
This had led to a flood of Moses based MT offerings, in addition to long-standing proprietary engines. These newer providers offer a range of options including full MT engine customization and even do-it-yourself or self-service customization by users.
The commoditization of statistical MT moves value creation into adjacent components and services. This migration of value creation also means exciting new opportunities for both MT providers and users. In this article we discuss some of these opportunities and the conditions under which they can be exploited using DIY/self-service MT solutions based on Moses.

Review of Nicholas Ostler’s book The Last Lingua Franca
The glorious future of machine translation has an avid supporter - Nicholas Ostler, historian of world languages, President of the Foundation for Endangered Languages, and author of The Last Lingua Franca.
Following his fascinating and erudite review of the rise and fall of such world languages as Sanskrit, Persian, Arabic, Greek and Latin, Ostler’s new book leaves very little hope that English will maintain its dominant position in the modern world for much longer. Not because of strong competition from another language, but because of the growing linguistic diversity of the internet.

Sustained healthy rates of economic growth in many parts of Asia are helping to swell the middle classes in the region. We can expect to see rising levels of demand for translation into and across the region’s languages for sometime to come.
It’s unlikely that we humans alone will have the capacity to satisfy such demand. Machine translation, with all its adequacies, will undoubtedly play a pivotal role in aiding communication and fueling cross border trade.

亚洲许多区域的经济得到了可持续的、健康的增长,这促进着亚洲地区中产阶级的不断膨胀。译入语为亚洲地区语言或跨亚洲地区语言的翻译需求也将越来越多。
我们人类不大可能独立满足这么大的需求。机器翻译完全符合这一要求,毋庸置疑它将在辅助沟通和加速边境贸易中起到至关重要的作用。

Sustainable Growth Series
You are the new CEO of one of the longest standing Fortune 500 companies. You had never dreamed of having this job. But for two weeks now you’ve been in charge. You had all the right cards to come in as a relative outsider and try and reinvent the company.
Not that your company is on the rocks. But you are the third CEO in five years. The company is cash rich and revenues are still growing. But it’s now an also-ran and shareholders are jittery. You don’t stand out. No problem. You have a plan – focus on the products that work, spin-off the rest. Make sure customers are at the core. You have an experienced team behind you.

We have published articles, videos and reports covering self-service MT implementation for some time. In our reports, we have covered manager’s decision making process, technical implementation, data selection and cleaning, quality evaluation, as well as the outcomes of research into usage and requirements.
Videos provide valuable information on lessons learnt and best practices among early adopters.

Now that collaboration and sharing are firmly on the translation industry agenda, it is imperative that the community develops user-friendly solutions to implement or work around the many technical standards that enable – yet also put the brakes on - efficiency and savings.
Quality is when the buyer or customer is satisfied. Yet quality measurement in the translation industry is not always linked to customer satisfaction, but rather is managed by quality gatekeepers on the supply and demand side who have specific evaluation models, the majority of which are based on counting errors, applying penalties and maintaining thresholds with little, if any, interaction from customers.
The centerpiece of translation automation, MT as usual served up some interesting new developments at the TAUS User Conference this year in Santa Clara. There were announcements of new products or applications from Microsoft and ABBYY, use cases of MT in real-time web situations, and a growing number of self-service SMT solutions.
The multilingual web has been an implicit item on the TAUS agenda since the beginning. This year’s annual User Conference in Santa Clara offered a golden opportunity to invite a keynote panel to drill down into this concept and come up with a status report on what is happening to global content, standards and language processing on the world’s largest piece of shared infrastructure and how it meshes with translation automation.

Seven predictions and a survey presented at the 19th FIT Conference, San Francisco, August 2011.
Translators in the 21st century find themselves in a difficult position.On the one hand there is a steadily growing demand for translation as a result of increasing global trade and communication generally. On the other hand it becomes harder and harder for the professional translator to meet this demand. Delivery times grow shorter and prices go down.
Technology is often thought of as an answer to this kind of pressure. But along with the technology come many new challenges. It is simply impossible for a translator who is trained in the language arts to keep up with the technology. And if she tries, frustration grows when she finds out that translation tools do not really work together very well. (See report Individual translators and data exchange standards.)
Then there are the economics. As the owner of a small business, translators must weigh the return-on-investment on time and money very carefully. Tools do not come for free and every new tool takes time to be mastered. What if these same tools – or machine translation – one day take over the job of human translators, as many of our colleagues fear. You might prefer to live on another planet, or at least work in another profession.
For the 19th FIT Conference held in San Francisco, 1-4 August 2011, TAUS ran a survey among the translators attending the conference. This article references a summary of the survey, and then makes seven predictions as a follow up to the keynote I gave to close the FIT event. The conclusion: the future for translators looks bright, but they will have to reinvent the profession first.
In the aftermath of the 2008 financial crisis, sixty-four (37%) of the survey respondents reported that translation rates continue to be under pressure. There seems to be a slight decline in translation volume, while the palette of languages seems to be broadening slightly. Thirty-seven respondents (21%) see business continuing as usual, while respectively 12% and 10% of them see opportunities for automation and innovation in the currently unstable market.
Which of the following technologies and/or innovations will translators apply in the coming two years? Sixty percent of the respondents say ‘no’ to machine translation, while 19% are already using it, and 21% expect they will use MT within the next two years. The main concerns about MT are the poor quality of MT output (76%) and the poor quality of source documents (54%). Those who look at MT on the bright side see cost reduction as the greatest benefit (39%) and the possibility of real-time delivery of translation as a secondary benefit (35%).
A majority of the respondents are interested in sharing translation memories and terminology: 35% already do so and 39% expect to be sharing language data within two years. However, another much larger poll by ProZ.com of 1,000 translators indicates that 49% would not consider sharing their translation memories. Translators are concerned about ownership of TMs and their relevance to the job at hand. But they do see the benefits of terminology searches of massive TM resources and the productivity gains these bring.
Click here for a summary of the full survey.
… change is the name of the game. And reinventing the profession is extremely hard if your days are spent just getting the jobs done and trying to make a modest living. Yet, for the first time in the history of the planet, translation is a really strategic activity. Thanks to Google Translate, Yahoo! Babelfish and Microsoft Bing, every soul on our planet now knows what translation means.
Hundreds of millions people press the translate button every day which makes them realize how difficult it is to get a good, accurate translation. As professionals we must realize that our community is far too small (just 250,000 or so professional translators in a world of 6,000 languages?) to serve the needs of seven billion citizens.
We are only scratching the surface. As professional translators – and as a global translation industry – our mission is to help the world communicate better. (That sounds better than being a lawyer or a banker, right?) For we now have the means to deliver on that mission. We simply need to find a way to do it properly. Here is how TAUS sees the future in seven predictions.
1. MT is here to stay
Let’s face it: machine translation will never be perfect. Every speaker of a language has the right to introduce new words, give existing words new meanings and change the spelling and grammar of his language. The point is: that’s what people do every day – witness Twitter or online chat, popular songs or political revolutions.
Computers just cannot keep up with these evolving nuances and associations in hundreds of domains and linguaspheres created by speakers of just one language. Yet, MT for all its mechanical faults is here to stay. Why? For the simple reason that we humans just cannot deliver enough translations in real-time.
Two other factors will also influence the rapid growth of MT. First, MT is getting better and better as we keep feeding the engines with human translated sentences to improve their domain knowledge and we keep tweaking the rules to improve the word order and forms. Second, a new generation of users are growing up, they are more forgiving, and open to self-service. Users may even step in and offer better terminology and forms of expression as a way to help others and themselves.
MT is here to stay and will be called “translation”. It will be embedded on every website, mobile and car app. Translation will become a utility, just like electricity, water and Internet: a basic resource and a basic human right.
2. High-quality translation will gain recognition
As machine translation becomes so universally available, it is clear that there isn’t just one single translation of a text that fits all. To differentiate their product offerings and appeal to specific customer groups, buyers will recognize the need for high-quality translation - call it personalization, transcreation or hyper-localization. This means that, machines will not replace human translators.
On the contrary, non-perfect MT output will stimulate the need for high-quality translation in a broad range of communication situations. The challenge we face as an industry is to agree on the criteria and the measurements for the level of quality that is needed for each situation. Sometimes MT is simply not an option. Sometimes MT is the only option.
3. Post-editing will come and go
Information travels fast and loses its value quickly. This is especially true for news, entertainment, online shopping and customer support content, but increasingly also for business-to-business and government information.
There is a fundamental shift from static “cast in stone” content to dynamic “on the fly” content. Instead of one or two releases per year, companies are shipping product updates on a weekly if not daily basis. And consumers, citizens and patients are increasingly sharing their reviews, tips and tricks in user blogs and social media in almost real time. Any chunk of information may be relevant and interesting to someone somewhere.
The key attraction of MT in this new information age is that it can deliver real-time translation to meet these changes. Potential cost reduction is only a secondary benefit. And the widespread fear that all human translators will soon be downgraded to mere post-editors of MT output is ungrounded.
Why? Well, in the next few years post-editing will grow quickly, but then we will see it diminish. But if there is no time for translation, then there is time for post-editing either. Real-time is real-time, right? In any case, MT technology will get better, using machine intelligence to learn from its mistakes and not make them again.
Translators who choose to work with computers will customize and personalize MT engines to specific tasks, customers and domains, rather than do stupid, repetitive error fixing. They will be promoted to ‘language quality advisors’ if you like.
4. Translators win when supply chains get shorter
More so than most other industries, the translation industry consists of a complex cascade of suppliers. There may be three or four levels between the translator and the end-user: translation agency, global multi-language vendor, corporate translation department and often an external quality reviewer or subject matter expert.
All these functions add a cost to translation but are they adding any real value in proportion to that cost? Tasks are often replicated and functions overlap. Disintermediation (i.e., ‘cutting out the middleman’) hasn’t really bitten into the translation industry yet as it has in the travel and banking industries, for example. But change is on the way, under pressure from the overarching need to translate more words into more languages.
Corporate and government buyers will analyze their supply chains to reduce their costs, and functions such as project management, quality assurance, vendor selection and translation memory management, will probably be streamlined, simplified or shared. Yet there will be no question about the critical role of the translator at the end of the chain.
Even though MT will be used to translate content streams requiring real-time translation, there will always be a need for a professional translator to tell good from bad language in the communication process.
5. The list of languages keeps growing
As global business is shifting from an export mentality to a world of open trading on a flat playing field, the nature of publishing and communications is also changing fundamentally.
In the old 20th century model the global manufacturer and publisher used to push information out to the world. They would select their markets, pick their most important language communities and translate their own instructions for use, brochures and web pages.
They would probably start with four to six languages and gradually add more languages if the markets prove to be worthwhile. In the new 21st century model, companies are realizing that their customers are not sitting there waiting for the information to be pushed out by manufacturers and publishers.
They are browsing the Internet and pulling down information wherever they find it. And if they can’t find it, they write their own reviews and comments that yet others may then translate to help their local peers. In the old world, content was owned by publishers; in the new world content is shared and earned.
In this radically changing environment, the range of languages for content is constantly growing. Successful global companies need to facilitate communications in a hundred-or more languages instead of the old standard set of seven or at the most twenty.
Translators in many more countries will benefit from this “democratization” of globalization.
6. Sharing data becomes the norm
Our concept of a ‘translation memory’ is about to change. Translation memories and translation memory tools have long been cultivated as our proprietary productivity weapon, perhaps offering a competitive edge in an environment where one fifth of professional translators (according to a recent ProZ.om poll) still don’t even use translation memories.
Yet, we have now reached the limits of potential productivity gains, and, let’s face it, translation memory technology itself – in its current and mostly used form – is no longer state-of-the-art. Most translation memory tools are stuck in a technology time warp and cannot leverage the power of corpus linguistics (see article The Future is Corpus Linguistics). A new generation of translation productivity tools will emerge that allow us to leverage any length of strings of text from very large corpora of translations.
These new tools will in many respects be using features and components that emerged from statistical MT technology, except for the fact that they leave the professional translator in full control of the processes. They will unleash the translational power hidden inside very large corpora of text. They will allow us to do semantic searches and clustering, synonym identification, automatic cleaning and correction of language data, sentiment analyses and predictive translations.
In anticipation of this next generation translation technology, many translators and companies have already started consolidating their translation memory data into large, searchable repositories. Some (more than you think) are even harvesting these language data from the Internet, meaning that they have computers crawling translated web sites, aligning the sentences from these web sites, and reconstructing translation memory files.
Call them pirates if you like. But as we have seen in other industries, they are the drivers of innovation. We at TAUS truly believe that it is this kind of innovation that is needed to unleash the power of the translation industry and enable it to prosper.
The TAUS Data Association was established in 2008 as a legal, not-for-profit member-driven organization aimed at hosting and sharing translation memories for all stakeholders in the global translation industry. The publicly accessible and searchable database already contains four billion words of high-quality translation data in 350-plus language pairs.
7. Translation becomes a business of choices
The future of translation either looks bright or gloomy: it depends on whether you want to change, reinvent yourself and adapt. Admittedly, this is not an easy choice. Nor is there a lot of time to consider all the options, but at least translators now have the luxury of choosing. In the past, you became a translator and you were in it for life. Unless of course you became a literary translator, in which case none of the above applies.
Today, you can choose to be a ‘boutique’ translator, specializing in a domain and providing hyper-localization or transcreation services. In this case, you will drift away from the original concept of a translator once you start specializing in your domain. You may be asked to create local content instead of translating text written for a different culture.
You may be asked to do brand checking for new product names. Your job title may change to ‘language consultant’ or ‘communications adviser’. If what you like is linguistics and computers, you may choose to become a specialist in training domain- and customer-specific MT engines, or in translation optimization, or in new functions such as language data cleaning, data selection on the basis of semantic search, search engine optimization, or sentiment and cultural analysis using customer feedback data.
The availability of language data in so many languages will open a much larger range of choices for specialization and innovation. And yes, you can also opt for post-editing machine translation output. Not so much fun if it is not your first choice, but in many ways this option is similar to the first wave of automation our profession experienced in the 1980s with the arrival of translation memory tools.
The good news now, is that the MT engines will soon learn from the corrections made by post-editors, so you will not have to make the same corrections again and again. And translators (or whatever their new title might be) will become much less solitary and grow closer to their colleagues and end customers.
Collaborative networks will bring language workers together. And buyers of translation and language-related services will eliminate one or two handovers in the supply chain and be able to connect directly with you.
Translation may, in many ways, become a commodity and a utility but that does not spell the end of the profession. On the contrary, it will stimulate the need for differentiation, specialization and value added services. It is up to the world’s translators to rise to the challenge, and open up to these changes, and reinvent their future.
Are translation automation and interoperability good for the translation profession?
Does it makes sense for translators to share translation memories?

Join us at the ProZ.com & TAUS Great Translation Debate Virtual Event
on 29 September 2011 / 1pm - 5:15pm CEST
to ask questions, discuss, agree and disagree all in the spirit of gathering our collective wisdom to decide.
The most significant academic event in computational linguistics, the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, took place in June 2011. The breakthroughs presented in this highly competitive conference often define the future of computational linguistics for years to come.
Last year, we interviewed Laurie Gerber for the article "What machines can’t translate…yet?" where she explained the current limitations of MT technology. Here I focus on two research papers from the ACL event, which stood out in this context. These try to tackle the challenges that arise when translating morphologically rich languages or language combinations where there is a great deal of word reordering needed during translation. I also review a third paper investigating a new avenue for automatically measuring MT quality. This is one of the most promising new approaches in this research area for sometime. If you are interested in understanding the basics of MT quality evaluation, you may find it useful to read the article by Alon Lavie on the “Essentials of machine translation evaluation” first.
Technology is an odd thing. It’s - well, it’s technological: It contains complicated details that are often not generally understandable, but its goal is usually to make things work more easily and invisibly. Chances are that we can’t really explain how our computers work, but we all manage to use them competently for our daily work.
Some technology tools are less complex and technical than a computer, but they’re still only understandable if we pause to decipher them or have someone else explain their intricacies. I won’t take the time in this report to trace all the ins and outs of translation-related data exchange standards, but I do provide an overview of what the standards are about and why we as translators should have a vested interest in them.

Are you a pragmatist, realist or believer?
You might already know that making a decision is not a science. You get around 80% there with the facts and the rest is left to judgment. It seems a large majority have made the judgment that interoperability is not a priority for the translation industry.
It’s easy to see why when the bare facts are laid out before you day after day: There are open standards, but even with years of effort behind them they don’t meet industry needs. Actually most users don’t change tools often so they don’t experience ‘lack of interoperability’. Companies have worked out their own workarounds and it’s difficult to see an ROI for changes now...and more. The unquestionable collective wisdom has decided.
In fact, this was documented very well by research we carried out earlier this year. Jaap also spoke about these findings at the EC’s Multilingual Web Workshop in Pisa. Never mind the interoperability pragmatists, realists or believers the research eludes to, it’s clearly more a tale of apathy if ever there was one.
So, why then all the funfair of recent months? All those comments on LinkedIn. The steady flow of invitations, announcements, surveys, webinars…Is there something new? Is it just people wanting you to join them on their personal merry-go-round?
Or perhaps these guys see the facts, that famous 80%, and still knock the collective wisdom?

How to translate‘cloud computing’, ‘cell phone’or ‘crowdsourcing’ in your language?
Many translators will use Google, Linguee, TAUS Search or a similar search tool and these will sieve through the data to find the answers. There’s knowledge in the data,and we have oceans of it.
Not just anonymous words indiscriminately scraped from the worldwide web, but good quality human translations from trusted sources, from government bodies and institutions, from companies large and small and from professional translators.
What could we do with it? We could transform the translation industry! Here is how we do it:
At the TAUS Executive Forum held in Barcelona, 9-10 June, the centerpiece comprised two “great debates” on topics that stand at the heart of the localization and translation industry.
How do we move forward on the thorny issue of interoperability in translation workflows?
And,
How will machine translation (MT) impact the industry now that there is fairly widespread adoption of the technology among buyers and LSPs, which is in turn attracting new offerings from developers?

The final session of the recent TAUS Executive Forum in Barcelona focused on innovative solutions using communities or data driven approaches.
Collaborative translation is a promising, new paradigm to leverage the power of community to translate and maintain dynamic content.
A series of twelve machine translation (MT) use cases were crisply showcased at the recent TAUS Executive Forum in Barcelona. Together they provided a comprehensive overview of advanced user experiences among government, large enterprises and solution providers.
As MT quality continues to improve for user interfaces and technical publications, the leading edge of the market is turning its attention to new domains and applications for MT, such as user forums and online chat.
Speakers and participants recognized the opportunity to capitalize on MT services for the public, whether in the public sector, consumer markets, or in the localization value-chain itself with some tool developers providing near plug-and-play MT engines for LSPs.
This complimentary report is relevant for anybody interested in customized machine translation, especially open source solutions. It summarizes the results of survey research undertaken by TAUS in partnership with EuroMatrixPlus, the European academic consortium responsible for developing and maintaining the Moses toolkit, in Q1 2011. The survey asked respondents to inform of their progress and experiences with Moses, and in particular to highlight the challenges they have encountered, as well as to suggest solutions.
The report also outlines our recommendations on findings. These serve as a foundation for future Moses development activity and collaborative industry initiatives.

Born in November 2004 TAUS sought to promote a fanciful idea that machine translation served a useful function[i]. It wasn’t long before this wayward child began to expound the importance of innovation[ii], unhelpfully rattling the boat and causing upset. Next there was much noise about the dubious concept of sharing translation memories[iii]. Followed by arguments for systems that interoperate, as well as for collaborative working[iv]. It seemed as if TAUS lived in visions, without a foothold in reality.
A few years ago, I converted my garage into a music studio to support my musical hobbies and interests. The initial investment I made in time and money has paid off in that I have spent countless hours of creative bliss experimenting with music on my own, but although I enjoy my hobbies immensely, I always come to the same conclusion after each session—music is social; I need other musicians to help me fully experience the phenomenon of creative expression through music. I need a group. I need a band.
We are sad to have lost a translation industry ‘founding father’ – Geoffrey Kingscott – who died on Wednesday March 2, 2011.
Below are some personal memories of colleagues who so much enjoyed working with him over the years.

Geoffrey Kingscott was an example and an inspiration for us in the early eighties when we just started building up our translation business. In a way he was our conscience. He published the Language Monthly (later Language International) magazine long before Multilingual Magazine existed. He ran his own translation business. He was an authority, a keynote speaker at many conferences. We loved to hear him speaking, for the British accent, the humor, and his incredible knowledge. The industry truly lost a ‘founding father’ in Geoffrey Kingscott.
I first met Geoff when he was editor (as well as founder and chief writer) of Language Monthly, as far as I know the first attempt to put non-academic language/business issues - especially translation - into newsworthy media content, built around events and personalities. He used to come to Paris each year, hire a stand at Expolangues - a language trade show in the heady days of the late 1980s when Europe was beginning to throw money at its language question - and try and whip up interest in his magazine (he published articles in French and German in those days before the global English generation). The 'language industry' was on the rise and rise at this time, and Geoff with his practical, intuitive intelligence grasped the importance of the impact of technology, while holding on to his personal, local take on the world of translation. He understood before most of us the need to communicate, to embrace the media, to be a message guy. Yet he was always suspicious of big money, political maneuvers, and technocratic control of the world of translation. He once told me he would have loved to write a book about the mystery of language, the fact that you can 'say' X in one language but have to paraphrase it, go round the houses to say the same thing in another. He was a good writer but alas it was one dream he never fulfilled.
It has been many years since I last met Geoffrey, but I remember him best as the driving force behind the ASLIB Translating and the Computer Conference, in the years that we frequented it in the 1980s. This event was always a high point in the year, when we met up with representatives of the language business and industries from all over the world.
He impressed me by his dry wit, common sense, deep insight into translation issues, but also his vision of the bigger picture, the preservation of local languages, and his perception of the role of language as cultural identity bearer.
Geoffrey is most memorable to me for his combined interests in things European (degree in French, a pioneer in the translation industry) and a total commitment to his locality in the English Midlands. He was (along with his wife) a keen local historian and wrote books on "the shires" as well as on that notable British obsession, the local railways. He had a dry sense of humour (and a distinct regional accent) and enjoyed telling how he once overheard someone advising not to study English at Nottingham University, lest they "end up sounding like Geoff Kingscott". Modest self-confidence was his hallmark. Geoffrey was active politically, and while he was ambivalent about the politics of the "European Project" (and eventually supported the UK Independence Party after a lifetime as a Labour man), he was deeply committed to the European cultural project. He wrote position papers that helped formulate the European Translation Platform in the 1990s, an early attempt to bring the European translation industry under a single umbrella and to promote "multilingual Europe". He edited several important journals (Language Monthly, Language International, Language Today) and published George Weber's influential article on "geolinguistics" in 1995, one of the first attempts to rank languages by their importance for business and the translation industry. He made significant contributions to our understanding of translation quality and evaluation, not least through his articles for LISA. He was a pioneer in technical translation, developing one of the first formal multilingual terminologies for the automotive industry; working with the Society of Automotive Engineers he created the TOPTEC conference series dedicated to automotive translation. Perhaps most significantly for the TAUS community, Geoffrey was an early technology advocate at Praetorius, the translation agency he ran for 20 years, and spread the word about the possibilities for translation technology/MT from its beginnings in the 1980s.
TAUS and friends welcome comments, memories and thoughts to this site from everyone who knew Geoff Kingscott and appreciated his wide-ranging contribution to the translation industry.

Report on a TAUS research about translation interoperability
Thirty-seven percent of the 111 respondents to our survey think that the lack of interoperability costs their business more than 10% of their total translation budget (or revenue in case of the service providers). Twenty-five percent say it costs them more than 20%. Only nine percent think it costs them less than 5% of their translation budgets. Forty-three percent of the respondents don’t know exactly how much it costs them. It is clear though that the lack of interoperability costs the industry a fortune. We are talking about the industry’s failure to exchange translation memories and terminology in a standard format, and to integrate translation software with content and document management systems.
The localization community is a relatively small one. Many of us talk to each other regularly about current challenges. Recently, a number of customers of a well-known globalization management system (GMS) provider got together at a user meeting. Based on our estimated 300 collective years of experience in GMS software, we compiled our common wish list for most-desperately-needed GMS features and futures. We weren’t quite in time to send Santa our list, but we channeled our wishes straight to Saint Sylvester, patron saint of the New Year, to intercede on our behalf in 2011!
The open source Moses SMT decoder, developed and maintained by universities, has grown into an elaborate ecosystem far beyond its academic origins. It’s now firmly rooted in the language industry and is branching out in original directions. Will this ecosystem develop sustainably into the future? Will there be a thousand MT systems based on Moses? In this article we examine the latest developments and future directions.

Collaborative translation presents us with a rich and complex envelope of processes and technologies, whose respective impacts are still poorly understood. Determining which approach can be used in which context and to what effect is still somewhat of an art-form, and currently, trial and error is often the only way to find out.
The translation industry is growing rapidly and, according to U.S. News and World Report, translation is one of the 50 best careers to pursue in 2011. Every year, thousands of students join translation courses attracted by such encouraging predictions, hoping they are placing their time, effort and money on a safe bet, hoping the chosen course will lead to a promising career.
Imagine you are a recent graduate of one of these courses. You even achieved a cum laude. You reflect on the knowledge you have gained, and the skills you have honed and mastered.
You are now a 'seasoned' linguist, an expert in intercultural communication and a competent information miner. Exciting theories, approaches and strategies still reverberate in your head. As do lectures about text types, equivalence, cohesion and coherence. You can see lexical gaps everywhere and you know that there is no such thing as absolute synonymy.

This is where the industry is heading, according to a group of 30 of the world's largest translation buyers. Do you agree?
Singapore. Roberto B. (name withheld for security reasons) is interviewing for a job with Glott.dot, one of the big four LSPs that dominate the translation industry today. He’s a computational linguist used to working with in an open source development team on community-centric language technologies, but the project folded for lack of money and he’s trying his luck in the corporate world. Glott.Dot VP Global Technology Development Wim Mulder is briefing him about the kind of work he’d be doing.
You trusted your bank. You trusted your currency. You trusted your government. You trusted your translations.
So what happens now? Your certainties are being unraveled one after another. The system you trusted is leaking. It is unsettling. And even scary… But then you realize: trust is good, and knowing the facts is always the best policy.
銀行に対する信頼。通貨に対する信頼。政府に対する信頼。翻訳に対する信頼。
そして、今、それらの確信は、次々と危険にさらされている。信用していたシステムにひびが入っている。不安になるし、恐怖すら覚える......だが、そこではたと気づく。信頼は悪いことじゃない、常に大事なのは事実の認識である、と。
Вы доверяли своему банку. Вы доверяли своей валюте. Вы доверяли своему правительству. Вы доверяли своим переводам.
Что же происходит сейчас? Идеалы, в которые вы верили, один за другим рушатся. Система, на которую вы полагались, рушиться, что вызывает беспокойство и даже страх. В тоже время, приходит осознание того, что доверять — хорошо, но знать правду всегда лучше.
Usted se fiaba de su banco. Se fiaba de su moneda. Se fiaba de su gobierno. Se fiaba de sus traducciones.
Pero ahora: las verdades en las que creía se están viniendo abajo una por una; el sistema en el que confiaba tiene grietas. Es algo inquietante. Espantoso... Pero luego uno se da cuenta de que la confianza es buena, y conocer los hechos es siempre la mejor política.
Once the world finally emerged from the crippling financial crisis of the late 2000s, the introduction of tough new regulations on the financial industry spurred new optimism in business and society.
After several horrendous incidents of human suffering resulting from the privatization of data on Google and Facebook, and the unfair control of internet provision by mobile operators, the authorities decided to follow Finland’s lead and make access to the internet a right for everyone. This involved assuring absolute internet neutrality and open standards for all right across the networks. To preserve their positions, Google and Facebook agreed to follow suit.
This prompted a tsunami of innovation in terms of applications and web services along with the sudden growth of new translation requirements as every community wanted their own language to join in the boom in global conversations.
A year ago, TAUS called for a thousand MT systems to bloom at the 2009 User Conference, also held in Portland (Oregon, USA). The technology was available, and there was a clear need to find innovative solutions to the sheer volume and cost of the world’s translation needs. Since then, we have been seeing real progress in MT deployment, with a number of LSPs and translation buyers implementing MT engines to handle some of the heavy lifting.
Since the widespread adoption of CAT tools, their incompatibility has been causing translators and other service providers much headache. In this article we look at the subjects of interoperability and open standards to assess the current state of affairs. We compare a selection of CAT tools, both Free/Open Source (FOSS) and proprietary, in order to provide an at-a-glance view of the potential compatibility of a range of tools. We take this opportunity to invite you to share your experiences.
Obama had narrowly won a second term, stimulating a sustained new wave of people power worldwide, helped by a rage-against-the-machine trend following a series of massive computer virus attacks.
The inexorable rise of the Chinese economy over the previous decade led to a large volume of translation work being taken over by Asian language service providers (LSPs), riding on the back of low wages and widespread but low-quality machine translation (MT) technology.
In an effort to claw back their business, other providers began to criticize translation quality and propose as an alternative very efficient, real-time, high-quality, human networks of translation built around soft technologies.
On the R&D front, insight into the translation process in the human brain using advanced fMRI scans had led to the development of powerful programs that rapidly train people to input spoken translation to the network in near real time across hundreds of language pairs.
Avram Aznavourian is a project manager with TranslationMinister, one of the world’s big five LSPs. He’s talking to one of his occasional suppliers, Larry Hu who runs VectorTrans, a multi language vendor based in Ho Chi Minh City, Vietnam. Avram is trying to explain why VectorTrans’ work quality has got to improve.
The 2000s may well prove to be the most productive decade for global machine translation research since the 1950s and early 60s. Rumor had it that at that time some $20 million (over $120 million at today’s rates) were devoted to mechanical translation research in the US alone, before government funding was switched off around 1966 following the infamous ALPAC report.
Our industry needs dispassionate, unprejudiced research. Alone it could never fund the depth of inquiry and breadth of trial and error testing required to improve systems and innovate with new models. We are all dependent on this surge of activity to push through to a new generation of marketplace solutions, usually years after the researchers that first invented them have moved on to new challenges. At the same time, this R&D landscape is changing.
In terms of public funding, there are ongoing programs for statistical MT funding by DARPA in the United States, and in Europe via the Seventh Framework Technology program of which the largest is the open-source enabled EuroMatrixPlus project. There are also plenty of other academic MT research projects in many universities and research institutes from Europe to South Africa via China to India. And major IT corporations such as IBM and Microsoft continue to fund natural language processing in general and translation technology projects in particular.
Tradução de Osmar Nonato Nascimento de Lima
A década de 2000, é muito provável, que tenha sido a mais produtiva sobre a pesquisa em tradução automatizada - em termos internacionais, desde os anos 1950 e inícios dos anos 60.Dizia-se naquela época, que, só nos EUA cerca de 20 milhões dólares (mais de 120 milhões em valores de hoje) foram destinados à pesquisa de tradução automática - o governo deixou de financiar por volta de 1966, depois do infame relatório ALPAC.
Nossa indústria precisa de pesquisa imparcial, sem preconceitos. Nunca poderia financiar por si só, em profundidade - a grande variedade de pesquisas, ensaios, testes de erro necessários para melhorar os sistemas e inovar com novos modelos. Todos dependemos deste repentino aumento de atividade para motivar a uma nova geração de soluções de mercado, que, normalmente, se materializam anos depois que os pesquisadores que primeiro os inventaram, e então - passando a ocupar-se de outros desafios. Ao mesmo tempo, que o cenário de P&D vem mudando.
Em relação a financiamentos públicos, existem vários programas de financiamento em andamento para MT com base em estatísticas pelo DARPA, nos Estados Unidos e na Europa por meio do Sétimo Programa Marco de Tecnologia; dos quais o mais importante é o projeto de open-source EuroMatrixPlus, ou código aberto EuroMatrixPlus. Existem também muitos outros projetos de pesquisa acadêmica de MT em muitas universidades e institutos de pesquisa, desde a Europa até a África do Sul - da China à Índia. E as grandes corporações de TI como IBM e Microsoft continuam a financiar o processamento de linguagem natural em geral e projetos de tecnologia da tradução em particular.
Вполне возможно, что первое десятилетие XXI века окажется самым продуктивным для исследований в области машинного перевода с
начала 50 - 60-х годов прошлого века. По слухам, только в США исследования в области автоматизированного перевода было вложено около 20 миллионов долларов (это более 120 миллионов в сегодняшнем эквиваленте). В 1966 году правительство перестало финансировать эти исследования после печально известного отчета Консультативного комитета по автоматизированной обработке естественного языка (ALPAC).
Наша отрасль нуждается в объективных, свободных от предубеждений разработках. Сама по себе она не в состоянии финансировать фундаментальные исследования и широкие испытания, необходимые для совершенствования существующих систем и внедрения новых моделей. Хотя, по сути, от этого зависит, сможем ли мы совершить прорыв к новому поколению коммерческих решений, который обычно происходит спустя годы после того, как изобретатели обращаются к новым задачам после первых открытий.
Что касается государственного финансирования, существуют постоянно действующие программы финансирования статистического МП под эгидой Управления перспективных исследований министерства обороныв США. Ярким примером является также программа Seventh Framework Technology в Европе, крупнейшим проектом которой является EuroMatrixPlus на основе открытого исходного кода. Множество научных исследований в области МП также ведется в университетах и исследовательских институтах Европы, Азии (в частности, в Китае и Индии) и Южной Африки. Кроме того, крупнейшие IТ-корпорации, такие как IBM и Microsoft, продолжают финансировать проекты в области обработки естественного языка в целом и технологий перевода в частности.
La década del 2000 puede llegar a ser la más productiva para la investigación en traducción automática a nivel mundial desde la de los 50 y principios de los 60. Se rumoreaba que en aquel momento, sólo en EE.UU, se dedicaron unos 20 millones de dólares americanos (más de 120 millones al tipo de cambio actual) a la investigación en traducción automática; el gobierno dejó de financiarla alrededor de 1966 tras el infame informe ALPAC.
Nuestra industria necesita una investigación imparcial, sin prejuicios. Nunca podría financiar por sí sola la gran variedad de investigaciones, ensayos y pruebas de error necesarios para mejorar los sistemas e innovar con nuevos modelos. Todos dependemos de este repentino aumento de actividad para impulsar una nueva generación de soluciones de mercado, que, normalmente, se materializan años después de que los investigadores que las inventaron pasaran a ocuparse de otros desafíos. Al mismo tiempo, este panorama de la I+D está cambiando.
En cuanto a la financiación pública, hay en marcha varios programas para la TA estadística, que en Estados Unidos corren a cargo de DARPA en los Estados Unidos y, en Europa, pasan por el Séptimo Programa Marco de Tecnología; el más importante es el proyecto de código abierto EuroMatrixPlus. También existen muchos proyectos académicos de investigación de TA en un gran número de universidades e institutos de investigación que abarcan desde Europa a Sudáfrica y desde China a la India. Y las principales empresas dedicadas a la informática, como IBM y Microsoft, continúan financiando el procesamiento del lenguaje natural en proyectos de tecnología en general y de traducción en particular.
Al mismo tiempo, gran parte de la investigación cercana al mercado está abandonando los tradicionales ámbitos académicos y los grandes laboratorios informáticos por el rápido mundo de la innovación industrial, prueba de ello es el enorme esfuerzo de Google en el apartado de la traducción estadística. La disponibilidad de recursos más baratos y de herramientas de código abierto también facilita la aparición de hábiles socios de servicios de automatización de la traducción (provenientes en ocasiones de departamentos de investigación académica) que realizan I + D para clientes que buscan soluciones técnicas más rápidas a problemas reales del mundo de la traducción.
Moses, un kit de herramientas de código abierto para TA estadística que se está probando en buena parte del sector, es, con toda probabilidad, el resultado más significativo de los que ha dado recientemente esta actividad concertada para la industria de la traducción, y constituye un símbolo actual de la influencia del paradigma de la potencia de los datos en la investigación científica y en el mundo de los negocios. De hecho, ya solamente en inglés, la lista de publicaciones académicas sobre TA estadística y temas relacionados está creciendo a pasos agigantados, lo cual refleja una nueva ola de especialización y colaboración, y un interés especial en compartir resultados que es de agradecer.
Algunos de estos programas de investigación han puesto sus miras en la creación de prototipos a corto plazo para objetivos no comerciales en los campos de la inteligencia militar (en EE.UU.) o en facilitar a los ciudadanos el acceso a la información (en la UE). Aunque es casi seguro que los resultados de estos proyectos de SMT que se han puesto en marcha contribuirán a la mejora, en un sentido más amplio, de los procesos de TA en el mundo real, no existe un modelo claro de cómo sus beneficios podrían llegar al mercado de manera eficiente y probada.
Una de las áreas clave para la nueva investigación es el estudio de cómo la adquisición de conocimientos sintácticos y semánticos por parte de la máquina puede enriquecer y potenciar los modelos de lenguaje que en la actualidad subyacen a los enfoques basados en datos. Es probable que más investigadores vuelvan a centrar su atención en arquitecturas adecuadas de anotación semántica para alimentar los procesos de traducción ricos en conocimientos.
En general, esta multiplicidad de centros de interés en la investigación es un buen augurio para la industria de la traducción en su conjunto, a pesar de los muchos espejismos y callejones sin salida a los que inevitablemente conduce. Cuanta más gente haya formulando hipótesis, realizando pruebas y seleccionando una ruta crítica a través de los distintos modelos de cualquier aspecto de la traducción, más posibilidades habrá de que, a la larga, todos nos beneficiemos del superviviente “más apto”. Por otro lado, la financiación para actividades de investigación es finita, por lo que es necesario tener unos puntos de referencia para ofrecer un entorno competitivo y probar los resultados de la investigación en TA en una etapa de preproducción.
Para poder ver cómo imaginan los investigadores el futuro de la automatización de la traducción, le pedimos a varios científicos que expresaran su punto de vista sobre lo que puede suceder en la próxima década. He aquí cinco áreas en las que podemos (o no) esperar novedades interesantes:
Un acontecimiento clave en el papel estratégico de la traducción en el mundo real será la aparición de la "transparencia lingüística"; otra forma de decir que (todos) los contenidos lingüísticos ya estarán intrínsecamente "listos para traducirse". Los usuarios podrán acceder al contenido en su propio idioma independientemente de su procedencia, y todas las plataformas de acceso incluirán, por defecto, traducciones automatizadas ya sea a través de un navegador o de cualquier otra aplicación. El proceso de traducción del contenido será invisible, como un interruptor dentro de la infraestructura.
Esto hará, a su vez, que la traducción automatizada afecte sobre todo a interacciones de contenido “transitorias” como el chat, el contenido dinámico de las redes móviles y los flujos de datos de los medios de comunicación social. Estas actividades de traducción serán prácticamente gratuitas y no requerirán una calidad optima, por lo que se producirán en gran medida fuera de la órbita de la industria de los servicios de traducción.
Mientras tanto, en aquellas áreas que entendemos que necesitan, más que ninguna otra, traducciones de alta calidad (gubernamentales, jurídicas, de producto, estratégicas, de alto riesgo, de contenido de marca), éstas seguirán realizándose más o menos de la misma forma que hoy en día, recurriendo a una combinación de persona, traducción automática + post-edición y aprovechamiento avanzado.
Los avances que impulsarán la transparencia lingüística del contenido textual no serán resultado de ningún descubrimiento concreto en la tecnología de lenguaje, sino de avances en la infraestructura, tales como un mayor ancho de banda, los recursos de computación en nube, el intercambio de datos y la minería inteligente de datos.
Aunque TAUS Data Association (TDA) y otros repositorios como las granjas de contenidos de MyMemory y Google Translate han ido acumulando una enorme cantidad de datos lingüísticos paralelos, uno de los problemas fundamentales del futuro inmediato será poner estas colecciones de datos a disposición de los científicos y de otras personas que los necesiten para enriquecer sus modelos de lenguaje.
Otro campo que ha despertado mucho interés recientemente y que, con toda probabilidad, seguirá despertándolo es el de las grabaciones de contenido oral bilingüe (por ejemplo, las grabaciones de interpretaciones simultáneas y consecutivas de reuniones y conferencias), un recurso que todavía está por explotar y que ayudará a desarrollar la traducción del lenguaje hablado en tiempo real. Parte de la agenda de I+D, tanto desde el punto de vista académico como del industrial, consistirá en desarrollar el tipo de infraestructura que facilitará la recogida y la aportación de este material como un recurso fiable para la investigación y la producción.
En el caso de los sistemas de producción, se podrá ser mucho más selectivo en el uso de los recursos de datos. Los usuarios podrán saber con precisión cuándo van a necesitarse grandes cantidades de datos para realizar un determinado trabajo de automatización de la traducción, y cuando bastará con una selección mucho más restringida de esos datos. En otras palabras, habrá una tendencia a hacer que tanto el acceso a los datos como su uso se realicen de una manera más inteligente.
El sentir general entre los investigadores es que, en el futuro, los traductores seguirán desempeñando un papel central en la producción de traducciones de alta calidad. También contribuirán, inevitablemente, a la puesta a punto y a la reparación de los textos de TA como post-editores a través de los circuitos de retroalimentación que son vitales para la optimización de los sistemas de TA. La acumulación gradual de textos posteditados se convertirá luego en una enorme colección de datos de entrenamiento que puede ser decisiva para estos sistemas,
Como es lógico, se seguirán estudiando las formas de optimizar esta relación simbiótica dentro de los distintos tipos de flujos de trabajo, con paquetes de herramientas mejorados para los post-editores. Pero lo más probable es que, en el rendimiento de la industria en su conjunto, sólo se produzcan avances graduales. Es de esperar que los traductores técnicos de mentalidad avanzada adopten las nuevas y potentes herramientas que resulten de esos estudios para seguir siendo competitivos.
La creencia actual es que existe un conjunto pequeño de problemas que planteará serias dificultades a la traducción automática, y otro conjunto mayor que puede abordarse con más optimismo y se resolverá en la próxima década . Los problemas que requieren un gran avance teórico (o que resultan ser intrínsecamente irresolubles por medios artificiales) afectan a aspectos conceptuales de la lingüística computacional más que a aspectos tecnológicos del entorno de la ingeniería del mundo real.
Los problemas solubles ya están en la agenda de la investigación y desarrollo. Uno de ellos es optimizar el tratamiento de idiomas con morfologías complejas o con órdenes de palabras no indoeuropeos, dos factores que normalmente hacen que a la máquina le resulte difícil ofrecer textos de buena calidad en algunos pares de idiomas. Lo más probable es que en este tipo de optimizaciones de sistema se añadan anotaciones a los datos paralelos existentes para ayudar al sistema a aprender con mayor eficacia.
En cuanto a la vieja fantasía del traductor artificial perfecto, la hipótesis que se baraja es que, para poder emular de forma sistemática (o incluso superar) a un traductor humano, un sistema tendrá que recurrir a "modelos del mundo" (a un conocimiento del mundo real) con el fin de salvar el difícil obstáculo de la calidad. Pero hasta ahora ha sido imposible programar una máquina para que comprenda la intencionalidad semántica de un texto.
Está claro que se puede programar a los ordenadores para que apliquen conocimientos lingüísticos, patrones estadísticos de fluidez, reglas lingüísticas, datos léxicos o contenidos paralelos. Pero ellos no pueden acceder a una base de conocimientos que les ayude a decidir de manera plausible cómo eliminar la ambigüedad de una expresión concreta en un contexto concreto.
Aunque habrá científicos que seguirán estudiando formas de automatizar cada vez más la capacidad de traducción humana, todo apunta, como hemos visto, a que la mayor parte del esfuerzo de esta nueva ola de investigaciones sobre la TA se centrará en los resultados prácticos de la tecnología de automatización.
Sobre la base de lo que ha dado en llamarse "la eficacia irracional de los datos", la mayoría de los científicos dedicados a la TA creen que hay una necesidad de modelos de lenguaje mucho más abstractos que puedan abordar la inmensa complejidad de los objetos lingüísticos y su sensibilidad al contexto para luego utilizar los datos disponibles en la mejora del proceso de traducción.
En otras palabras, los datos que ha ido acumulando la industria en los últimos treinta años servirán para ayudar a los científicos a encontrar técnicas con las que construir, a su vez, mejores sistemas de traducción de producción. Parece que nos encontramos ante un ejemplo muy productivo de la cultura de compartir.
Deseamos expresar nuestro agradecimiento a los siguientes científicos por contribuir con sus opiniones a este artículo:
Christian Boitet, Université Joseph Fourier, Grenoble
Daniel Hardt, Copenhagen Business School and LanguageLens
Anthony Hartley, Leeds University
Kevin Knight, Information Sciences Institute and University of Southern California
Alon Lavie, Carnegie Mellon University and Safaba Translation Solutions
Joseph Mariani, University of Paris
Andrei Popesco-Belis, Idiap Research Institute, Martigny
Mark Seligman, Spoken Translation Inc.
Khalil Simaan, University of Amsterdam
Gregor Thurmair, Linguatec
Andy Way, Dublin City University and Applied Language Solutions
2000年代は、1950年代や60年代前半以来、世界的な機械翻訳研究が一番成果を上げた10年間かもしれない。50年代や60年代前半は、米国だけでもおよそ2000万ドル(現在のレートに換算すると1億2000万ドル超)が機械翻訳研究につぎ込まれたが、1966年頃に政府の資金援助は打ち切られた。そのきっかけは、悪名高い ALPAC報告書であるというのが、もっぱらのうわさだ。
われわれの業界には公平で偏見のない研究が必要だ。業界単独では、深みのある研究、幅広い実験、システム改良やモデル革新に不可欠なエラーテストを実行する資金調達は難しい。われわれは皆、商業ソリューションの新時代を切り開く活動のうねりに依存している。そのうねりは大抵、ソリューションを最初に考案した研究者が次の課題に移った数年後に起こる。それと並行して、研究開発の状況は変化していく。
Ten years ago, the millennium brought powerful computers at affordable prices and advances in statistical pattern-matching algorithms. The advent of these foundation technologies heralded a new age for statistical machine translation (SMT). In this new era, translation users could expect more natural machine translation output than from the soon-to-be-legacy rules-based machine translation (RbMT) technology, and with less overhead to maintain the overall machine translation cycle.
During the last five years, translation services sector analysts have been abuzz with the potential virtues of automating translation with SMT output. To date, reports have focused on SMT’s technology, translation quality requirements, and the threats to the human translator workforce.
So many conferences have been organized, associations established, lectures given, and academic papers written about standards in the translation and language industries. All of this has certainly been necessary and relevant. But at the same time it seems to be ignoring the simple truth about standards.
And in the end if these powerful few do not embrace open industry standards, the holy grail of genuine interoperability will remain a pipe dream for some time to come. Up to now there’s clearly been a conflict in balancing the profit motive with principled calls for openness. The latter hasn’t really stood a chance as our chosen market leaders have been able to advertize interoperability and standards compliance as attractive features of their products, but have then modified and “improved” them. The result is that users find that their texts, terminology and translation memories are locked up in closed software systems.
翻訳・ローカリゼーション業界での標準化については、これまで数多くの会議が催され、団体が設立され、講義が行われ、学術論文が書かれてきた。これらはすべて確かに必要だし、意味のあることだ。しかし同時に、これらは標準化についての明らかな事実を無視しているように思われる。
結局のところ、圧倒的影響力を持つリーダーが業界のオープン標準を受け入れなければ、本物の相互運用性という至高の目標はしばらく夢物語にとどまるだろう。利益追求とオープン化を求める声のバランスをどう取るか、この葛藤は明らかに存在してきた。これまではオープン化の可能性はあまりなかった。選ばれし業界のリーダーたちは、相互運用性と標準準拠を自社製品の魅力として宣伝する一方で、それらを修正し、彼ら言うところの『改良』を行ってきたからだ。結果として、ユーザーのテキスト、用語集、翻訳メモリは、彼らの互換性のないソフトウェアシステム内に閉じ込められてきた。
Tradução de Osmar Nonato Nascimento de Lima
Assim, muitas conferências foram organizadas, associações estabelecidas, palestras e trabalhos acadêmicos produzidos sobre os padrões no mercado de idiomas e de tradução. Sem a menor dúvida, efetivamente, necessárias e pertinentes. Mas, ao mesmo tempo, parece ignorar a simples verdade sobre os padrões.
E no final, se essa minoria poderosa não abraça os padrões abertos da indústria, o Santo Graal da verdadeira interoperabilidade continuará uma utopia ainda por muito tempo. Até agora, tem havido, de modo claro, um conflito em equilibrar o motivo de lucro com base em princípios da abertura. Estas últimas estavam condenadas a fracassar porque, ainda que os líderes de mercado chegassem a anunciar a interoperabilidade e o cumprimento de padrões de conformidade com as características atraentes de seus produtos, e posteriormente modificaram e melhoraram seus padrões. O resultado é que os usuários acham que as memórias de seus textos, terminologia e tradução são aprisionados em sistemas de software fechados.
По проблемам стандартов в отрасли перевода проведено множество конференций, учреждено множество ассоциаций, прочитано множество лекций и написано множество научных трудов. Вся эта работа, несомненно, была необходима и важна. Однако в то же самое время кажется, что очевидная истина о стандартах перевода была упущена из виду.
А в конечном итоге, если эти немногие авторитетные компании не возьмутся за внедрение открытых отраслевых стандартов, то заветная цель — подлинная функциональная совместимость — будет оставаться недостижимой в течение неопределенного времени. Пока наблюдается явное противоречие между стремлением к прибыли и принципиальными требованиями ввести открытые стандарты. На самом деле до сих пор не было никаких шансов удовлетворить подобные требования, поскольку лидеры рынка вначале объявляли о функциональной совместимости и соответствии стандартам как привлекательных характеристиках своих продуктов, а затем модифицировали и совершенствовали эти продукты. В результате пользователи сталкиваются с тем, что тексты, терминология и память перевода оказываются изолированными в закрытых программных системах.

Muchas son las conferencias que se han organizado, las asociaciones que se han establecido, las clases que se han impartido y los trabajos que se han escrito en torno a los estándares de calidad la industria de la lengua y la traducción. Y no cabe duda de que ha sido algo necesario y pertinente. Pero al mismo tiempo parece ignorarse la sencilla verdad sobre esos estándares.
Si esa minoría poderosa no abraza los estándares abiertos de la industria, el santo grial de la interoperabilidad genuina seguirá siendo un sueño imposible durante bastante tiempo. Es evidente que hasta ahora ha habido un conflicto a la hora de equilibrar el afán de lucro con las llamadas a los principios de la apertura. Estas últimas estaban condenadas a fracasar porque, aunque los líderes del mercado han llegado a anunciar la interoperabilidad y el cumplimiento de los estándares de calidad como rasgos atractivos de sus productos, posteriormente los han modificado y "mejorado". Y, como consecuencia de eso, los usuarios se encuentran con que sus textos, su terminología y sus memorias de traducción están aprisionados en sistemas de software cerrados.
Ahora existen razones de peso para invertir esta tendencia y adoptar un nuevo orden del día en la interoperabilidad.
Son elementos impulsores que corroboran que será cada vez más difícil que una compañía siga siendo competitiva sin alcanzar un mínimo grado de compromiso con la interoperabilidad:
Ya no tenemos que elegir entre ser generosos y ayudar al mundo o ser malos, porque nos ponemos a nosotros mismos en primer lugar.
Hoy en día, podemos ser buenos, ayudar al mundo y proteger nuestros beneficios al mismo tiempo. Uno de los más nobles objetivos de nuestra industria, sin duda, es ayudar al mundo a comunicarse más eficazmente. Nuestro éxito en el desempeño de esta misión es mayor cuando podemos asegurar que nuestros textos, nuestra terminología y nuestras memorias de traducción pueden moverse libremente entre herramientas, traductores y plataformas sin perder su valor, formato o atributos.
A medida que construimos la plataforma de compartimiento de datos lingüísticos, la Asociación de Datos TAUS, nuestras experiencias confirman lo que ya sabíamos, que la dificultad está en los detalles, y que los estándares son el único modo de solucionar o evitar esos problemas tan pequeños pero tan fastidiosos y agilizar nuestros procesos.
TDA se basa en dos estándares clave: TMX y XLIFF. En el repositorio en formato TMX 1.4 de TDA, se comparten y se almacenan cerca de 3 mil millones de palabras en 320 pares de idiomas. Pero, al descargar archivos TMX y utilizarlos en su propio editor, es posible que los usuarios pierdan coincidencias porque los vendedores originales de la herramienta de traducción decidieran aplicar un estándar TMX ligeramente no-estándar.
En los últimos cuatro meses, nuestro equipo de desarrollo ha trabajado intensamente para facilitar el aprovechamiento avanzado de traducciones directamente desde el repositorio TDA. Este nuevo servicio de búsqueda de coincidencias se basa en el estándar XLIFF, por lo que cualquier edición, publicación o herramienta de traducción que sea compatible con XLIFF puede comunicarse con la base de datos de TDA y extraer coincidencias de las traducciones.
Pero, una vez más, comprobamos que los líderes del mercado, entre otros, en lugar de ceñirse al etiquetado normal de metadatos, que era el objetivo, han aprovechado las características de extensiones del estándar XLIFF para distinguirse de la competencia. Y hay otros que ni siquiera hacen el esfuerzo de cumplir con el objetivo.
El empeño incesante de TAUS a favor de las plataformas de traducción libres, la innovación, la colaboración y el intercambio de datos lingüísticos nos lleva a una pregunta fundamental: ¿se pueden construir empresas de servicios de éxito y al mismo tiempo servir a ese deber superior de nuestra industria que es beneficiar al mundo en su conjunto?
Nosotros creemos que es posible y deseable. Y llamamos la atención de los grupos de la industria para que respondan a esta "llamada del deber":
Estamos haciendo una llamada a todos para que se ayuden a sí mismos ayudando a la industria y al mundo al que servimos. Imagine cuánto podemos ahorrar en costes, cuánto más podemos traducir, y cuánta gente más puede beneficiarse de lo que hacemos. ¡Incluyendo a los líderes del mercado!

There’s a rapidly growing range of machine translation solutions. Lionbridge and IBM recently entered the market. SDL has strengthened its hand with the acquisition of Language Weaver. Systran, PROMT and other stalwarts have stepped up their games. And middle-tier language service providers are developing their own engines. Some like Applied Language Solutions and Pangeanic are already competing with the likes of IBM and SDL.
So it’s clear you can build and customize systems using vendors or even do it in-house. But how do you assess how well these alternatives perform, whether they are up to scratch, whether they improve over time due to customization and further development?
You need concrete measures to make informed decisions on investments, to calculate ROIs, and to quantify the effectiveness of the alternatives you are considering.
If you are a language service provider or enterprise executive considering MT for the first time, it’s important that you have an awareness of the main distinctions, approaches and hazards. This article is a quick primer to get you started.
So you are a translator. You have a loving, intimate relationship with words. You thrive on the challenging, mind-wrestling quest for equivalence and yes - you know the difference between a participle and a gerund. You care – not just about style, register and cultural nuances, but most of all, about the quality of your work. You are a linguist, a writer, a cultural expert and a field expert on many subjects, a researcher, an IT expert, a graphic designer... a one-man orchestra. You work autonomously and are driven by mastery.
Yes, you are a translator. You work long hours on texts that are becoming increasingly boring – be it lengthy automotive manuals that no-one ever reads or help files that no-one ever wants. You are forced to recycle words although climate change is not high on you agenda. And yes – you see the rates dropping with the speed of light and wonder when will they ask you to work for free?
そうか、あなたは翻訳者か。それなら、言葉に対する思い入れは深いはずだ。適切な言葉を探して知恵を絞る挑戦を生きがいとし、当然ながら、分詞と動名詞の違いも分かっている。スタイルや規定の用語、文化的なニュアンスはもちろんだが、何よりも仕事の質にこだわるはずだ。あなたは言語学者であり、ライターであり、文化の専門家であり、数多くの分野のエキスパートであり、研究者であり、IT専門家であり、グラフィックデザイナーだ。いわば、1人オーケストラと言ったところか。自主的に働き、完璧の追求に突き動かされている。
そう、あなたは翻訳者だ。それなら、読めば読むほど退屈なテキストに長時間向き合っていることだろう。取り組んでいるのは、誰にも読まれない自動車業界の長たらしいマニュアルかもしれないし、誰にも必要とされないヘルプファイルかもしれない。気候変動や環境保護に関心がなくても、文を再利用することを強いられていたりする。そうそう、それから瞬く間に落ちていく翻訳レートに、タダで翻訳を依頼されるのも時間の問題だと思っているかもしれない。
Tradução de Osmar Nonato Nascimento de Lima
Por isso você é tradutor. Tem uma relação íntima e amorosa com as palavras. Você se esforça para superar os desafios, dando tratos à bola para achar equivalências ― e não é só saber a diferença entre particípio e gerúndio. Você não só se preocupa com as nuances de estilo e registros culturais ― mas, sobretudo com a qualidade da tradução. É linguista, escritor, pesquisador, projetista gráfico; dominando uma vasta gama de assuntos ― excelente bagagem cultural, especialista em TI, conhecedor de informática... Enfim, homem dos sete instrumentos. Para trabalhar como freelancer, aquilo que o motiva é a busca à perfeição.
Sim, você é tradutor― trabalhando muitas horas em textos que ficam cada vez mais chatos (sejam longos manuais do proprietário e arquivos de ajuda que quase ninguém os leem). Você é obrigado a criar neologismos mesmo que as alterações de impacto não sejam prioridade em sua agenda. E ainda, presenciando a queda de remuneração tradutória à velocidade da luz e pergunto: quando seremos convidados a trabalhar de graça?
Перевод выполнен специалистами компании Логрус
Итак, Вы — переводчик. Вы нежно и трепетно относитесь к словам. Вы идете к профессиональному успеху в захватывающем и напряженном поиске эквивалентов и, конечно же, понимаете разницу между причастием и родительным подежом. Вас интересует не только стиль, слог и культурные особенности, но и, в первую очередь, качество своей работы. Вы - лингвист, писатель, культуролог и эксперт во многих областях, исследователь, специалист по IТ, дизайнер. Одним словом - человек-оркестр. Вы работаете независимо, и вами движет стремление к совершенству.
Вы — переводчик. Долгие часы вы работаете над текстами, которые становятся все скучнее и скучнее, будь то пространные инструкции по эксплуатации автомобилей, которые никто не читает, или справки, которые никому не нужны. Вы вынуждены повторно использовать одни и те же слова. А еще вы видите, как расценки падают со скоростью света, и спрашиваете себя, когда же вас попросят работать бесплатно.
Así que usted es traductor. Tiene una relación amorosa e íntima con las palabras. Se crece con el reto, con esa lucha mental que es la búsqueda de la equivalencia, y sí: conoce la diferencia entre un participio y un gerundio. A usted no sólo le importan los matices de estilo, de registro y culturales, sino sobre todo, la calidad de su trabajo. Es lingüista, escritor, experto en cultura y especialista de campo en muchos temas, investigador, experto en computación, diseñador gráfico... un hombre orquesta. Trabaja de forma autónoma y lo que le impulsa es el afán de perfección.
Sí, es traductor. Trabaja muchas horas en textos que se están volviendo cada vez más aburridos (ya sean largos manuales de automoción que nadie lee nunca o archivos de ayuda que nadie quiere). Se ve obligado a reciclar palabras aunque el cambio climático no es una prioridad en su agenda. Y sí: ve cómo los precios caen a la velocidad de la luz y se pregunta ¿cuándo me van a pedir que trabaje gratis?
Tak więc jesteś tłumaczem. Twoje relacje ze słowami przepełnione są intymnością i uczuciem. Uwielbiasz wyzwania, zapaśniczą wręcz walkę umysłu w pogoni za ekwiwalencją i wiesz dokładnie, jaka jest różnica pomiędzy imiesłowem a gerundium. Przywiązujesz wagę nie tylko do niuansów kulturowych, stylu i tonu wypowiedzi, ale przede wszystkim, do jakości twojego przekładu. Jesteś lingwistą, pisarzem, rzeczoznawcą od spraw wielu, ekspertem w zakresie kultury, badaczem naukowym, informatykiem, grafikiem komputerowym… jednoosobową orkiestrą. Pracujesz samodzielnie a twoją siłą napędową jest dążenie do perfekcji.
Tak więc jesteś tłumaczem. Spędzasz długie godziny tłumacząc coraz nudniejsze teksty – czy to ciągnące się w nieskończoność instrukcje obsługi samochodów, których nikt nigdy nie czyta czy pliki pomocy, których nikt nie potrzebuje. Jesteś zmuszany do recyrkulacji słów, chociaż do zmian klimatycznych nie masz obecnie głowy. Jednocześnie widzisz, że stawki obniżają się z prędkością światła i zastanawiasz się, kiedy dojdą do zera.
COMPLIMENTARY TAUS REPORT
You have been managing translations for your company for more than fifteen years now. You have moved from project manager to localization manager to globalization director. First you worked closely with the project manager at your vendor organization that handled the four languages of your company’s flagship product. Many lessons were learnt. The process got better, but quality remained variable.
The addition of more languages was a good occasion to try out new vendors. There were plenty of vendors who offered their services at competitive prices. By the time your product was translated into fifteen languages, you found yourself working with twelve different vendors. Your focus had shifted from quality to process and cost. The translation memories that helped you automate translation to some extent now had to be consolidated in one central place. This would finally give you full control over cost, process and also improve quality. You convinced your management to invest half a million in a Globalization Management System.
At the TAUS User Conference, a number of attendees pitched brief proposals for achieving greater power and efficiency in the translation chain through innovations involving sharing and pooling ideas and effort.
Laurie Gerber started out as Japanese dictionary “coder” at Systran in late 1986. Fast forward twenty-five years and she is a senior industry figure, advising government and industry on major machine translation research and deployments. TAUS consultant Colin Brace caught up with Laurie to get her take on the progress made with machine translation (MT) technology in the last 25 years and the challenges that remain. We also asked Laurie to provide a detailed critique of issues with newer data-driven approaches to MT, such as those used by Asia Online, Google, IBM, Microsoft and SDL Language Weaver, amongst others.
Machine translation has arrived for good in the language industry; the influential Global Watchtower blog even called it a tidal wave earlier this year. TAUS’ recent article on Facebook, Google, IBM and Microsoft ended by promising to look at the issues highlighted by the team’s analysis. Their assessment is that we need shared services and resources in 21st century translation to grow from a $15 billion industry to a $70 billion one. I pick up the baton by proposing companies combine investments to effectively leverage open source SMT for localization.
While research in machine translation has been going on for over 50 years, recently adopted statistical methods make it easier to create customized MT systems for specific language pairs and domains. By leveraging existing translation assets, translation providers and buyers can build well-performing MT systems that, combined with post-editing, are more productive than traditional translation methods alone. This is beginning to lead to wide-spread adoption of statistical MT (SMT) systems throughout the industry.
Something very significant happened in Canada last week concerning, of all things, machine translation. First came the Météo system; then the Hansard bilingual training corpus; and now this... On Monday, August 2, the British Columbia division of the RCMP 1 posted a note on its Website suggesting that citizens who wanted a French-language version of the English news releases published there should use Google Translate.
A spokesman for the federal government’s police force later explained that although the BC division is the largest in the country, it only has one French translator. As a result, citizens who had been requesting French translations sometimes had to wait three to four weeks. The BC Mounties had requested funding for an additional French translator but, owing to budgetary restrictions, that request had been denied. Using Google Translator, the spokesman claimed, citizens would be able to obtain a French version of these news releases (or a Punjabi, or Cantonese, or German one) almost instantaneously, and at no cost.
It’s become a standing joke in technology crystal-ball gazing: fully automatic machine translation will be available “within five years”, a prediction made regularly since the 1980s. Well, this time it seems to be true. In 2005 TAUS began bringing industry leaders together to promote greater technology awareness in the translation field – an industry historically shy of the glint of machinery in its workflows. In 2008 we published our white paper on Language Business Innovation, identifying translation automation, crowdsourcing, and language-data sharing as key trends for a translation industry ambitious enough to embrace the global challenges of the new century. And today, just two years later, we see the green shoots of such change everywhere.
テクノロジーの未来を予想するお決まりのジョークがある:それは 完全に自動化された機械翻訳が『5年以内に』実現するというもので、この予想は1980年代からあった。だが、今回ばかりは、それが現実のものとなりそうだ。翻訳業界は伝統的にワークフローへの機械導入に消極的だが、TAUSは2005年から業界のリーダーたちを集め、翻訳分野の技術に対する意識向上に努めてきた。2008年には言語関連ビジネス・イノベーションに関する報告書を発表し、新世紀のグローバルな挑戦を受け入れる覚悟のある翻訳業界の鍵を握るトレンドとして、翻訳の自動化、クラウドソーシング、言語データ共有を挙げた。そこから少し経過した今、そうした変化の兆しがあちこちに表れている。
Tradução de Osmar Nonato Nascimento de Lima
A partir da década dos anos 80, a previsão realizada com frequência de que a tradução totalmente automática estaria disponível "em cinco anos", converteu-se numa brincadeira constante quando se olhava a bola de cristal que previa o que nos apresentaria a tecnologia. Bem, desta vez parece ser verdade. Em 2005, a TAUS começou a reunir os líderes da indústria com vistas à promoção de uma conscientização maior da tecnologia na área de tradução - um setor que, historicamente, tem brilhado pela ausência de mecanização de seus fluxos de trabalho. Em 2008, publicamos o nosso white paper, ou livro branco, no campo da inovação comercial dos idiomas, identificando a automação da tradução, crowdsourcing, ou terceirização em massa e compartilhamento de dados linguísticos, como as principais tendências para um ambicioso mercado da tradução, o suficiente para abraçar os desafios globais do novo século. E hoje, somente dois anos depois, vemos os rebentos verdes de tal mudança por toda a parte.
Imagine you are a small language service provider (LSP), one of the thousands of translation agencies listed in the Yellow Pages of the world. You are kind-of midlife. Business is tough. You exist because of the words you sell but your word rates are under pressure every year. You’d like to think that you are an entrepreneur: that you are free to make choices. But what choices do you really have? Margins are being squeezed. Machine translation is suddenly the ‘talk of the industry’, and non-professionals – “crowdsourcers” – are willing to compete for your jobs, at least in some industries.
あなたは小規模な言語サービスプロバイダ(LSP)で、世界のイエローページに掲載されている何千もの翻訳会社の一つ。そう仮定してみよう。あなたは中年にさしかかっている。仕事の状況は厳しい。翻訳で生計を立てているが、翻訳レートは年々強まる値下げ圧力にさらされている。自分は起業家であり、やりたい仕事を自由に選べると考えたいところだが、実際のところ、どれだけの選択肢があるというのだろう?利益率は圧迫されている。機械翻訳が突然 『業界の話題』となり、一部の業界では素人集団への業務委託、クラウドソーシングとの受注競争を強いられるようになっている。
あなたは追い詰められた気分になる。直接取引のクライアントやアカウントもあるにはあるが、あなたの最大のクライアントはなんといっても国際的な大手の翻訳会社、マルチランゲージ・ベンダー(MLV)だ。しかし、こういった大手の多言語ベンダー(MLV)は、翻訳レートや支払いとなると財布の紐が固くなりがちだ。縁を切りたいところだが、収入の大半をそこに頼っているので、それはできない。さあ、どうする?あなたの事業を買ってくれるところはあるだろうか......?あまりなさそうだ。あったとしても、よほど大きな顧客でなければ無理だろう。ある晴れた日、あなたは勇気を持って行動に出ることを決意する。自分の運命は自分でコントロールするのだ。
Tradução de Osmar Nonato Nascimento de Lima
Imagine que você é um pequeno fornecedor de serviços linguísticos (LSP), uma de milhares de agências de tradução listadas nas Páginas Amarelas do mundo. Você está chegando à meia-idade. O mercado é difícil e muito fechado. Você existe por causa das palavras que você traduz , mas não há um só ano em que não se veja pressionado a abaixar sua remuneração. Você gostaria de pensar que você é um empreendedor: que você é livre para fazer escolhas. Mas que escolhas você realmente tem? As margens de lucro vem encolhendo. Mas, de repente, o que se ouve "falar no mercado", é sobre tradução por máquina, e os tradutores não profissionais - praticantes do “crowdsourcing”- estão dispostos a disputar com o seu trabalho, ao menos em alguns mercados.
Это предсказание относительно будущего технологий уже давно превратилось в постоянный объект шуток: полностью автоматизированный машинный перевод станет возможен «в ближайшие пять лет» — прогноз, который с завидной регулярностью делают начиная с 80-х годов прошлого века. Что ж, в этот раз, кажется, это предсказание сбудется. В 2005 году организация TAUS начала работу по объединению усилий отраслевых лидеров, имея своей целью повышение уровня осведомленности о технологиях в сфере перевода — в отрасли, которая традиционно недоверчиво относится к проникновению машин в свои рабочие процессы. В 2008 году мы опубликовали доклад «Инновации в переводческом деле» (Language Business Innovation), в котором автоматизация перевода, краудсорсинг и обмен языковыми данными были определены как ключевые тенденции в развитии индустрии перевода — отрасли, которая готова достойно справляться со сложными глобальными задачами нового столетия. Сегодня, два года спустя, мы повсюду видим первые признаки подобных перемен.
Предположим, что вы владеете небольшой компанией, оказывающей услуги перевода, которая подобна тысячам других бюро переводов по всему миру. У вас своего рода кризис среднего возраста и дела вести нелегко. Вы существуете за счет оплачиваемых слов, но ваши ставки за слово год от года снижаются. Вам хочется думать, что вы предприниматель, что вы свободны в своем выборе, но так ли это на самом деле? Доходы сокращаются, и машинный перевод неожиданно становится всеобщим предметом разговоров. Непрофессионалы («краудсорсеры») готовы составить вам конкуренцию, по крайней мере, в некоторых областях перевода.
Imagine que es un pequeño proveedor de servicios lingüísticos (LSP), una de las miles de agencias de traducción que figuran en las Páginas Amarillas del mundo. El mercado es muy duro. Usted existe por las palabras que vende, pero no hay año en el que no se sienta presionado a bajar sus tarifas por palabra. Le gustaría pensar que es un empresario: que toma sus decisiones libremente. Pero ¿qué opciones tiene en realidad? Porque los márgenes se están reduciendo. De pronto, de lo que se habla en la industria es de la traducción automática, y los traductores no profesionales (practicantes del “crowdsourcing”) están dispuestos a disputarle sus proyectos, al menos en algunas industrias.
A partir de la década de los 80, la predicción realizada con frecuencia de que la traducción totalmente automática estará disponible "en cinco años", se ha convertido en una broma permanente al mirar la bola de cristal que prevee lo que nos traerá la tecnología, aunque esta vez parece ser cierta. En 2005 TAUS comenzó a reunir a los líderes de la industria con vistas a la promoción de un mayor conocimiento de tecnología en el campo de la traducción, una industria que históricamente ha brillado por la ausencia de la mecanización en el trabajo. En 2008 publicamos nuestro libro blanco sobre la innovación en el sector de los idiomas, que describía la automatización de la traducción, crowdsourcing (tercerización masiva) y el reparto de datos lingüísticos, como tendencias clave para que la industria de la traducción fuese lo suficientemente ambiciosa como para asumir los desafíos mundiales del nuevo siglo. A día de hoy, sólo dos años después, vemos los brotes verdes de este cambio por todas partes.
Earlier this year I wrote about how language barriers are creating a new digital health divide and I suggested that the single biggest barrier to successfully connecting patients online internationally is language. On the one hand, the Internet has broken down many boundaries and has changed the geography of healthcare, uniting patients and healthcare stakeholders all over the world so that people are not constrained by information available in their own country alone. Yet on the other hand, language has become an even greater barrier as it separates people into groups – the advantaged or the disadvantaged – based on the information they can access.
Increased use of machine translation (MT) by clients and language service providers has resulted in a greater need for posteditors; however, conflicting or lack of postediting guidelines and acceptance criteria has created resistance among language specialists in providing this much needed service.
Asia Online is the first technology supplier to have addressed the need to develop a statistical machine translation (SMT) solution in Asia for often data-poor Asian languages. Building on an initial set of technology that has its roots in the open source Moses engine, Asia Online has used this technical knowledge to develop a comprehensive commercial enterprise-class SMT-as-a-service platform.
Asia Online is also a large-scale user of machine translation in its own Asian context, enabling it to learn about and develop solutions to challenging translation and quality issues.
Overall, the company’s main thrust is to encourage the development of SMT engines by combining intensive human work with machine processing in a virtuous circle. By concentrating on this process in the initial ramp-up period, Asia Online offers higher quality, better-tuned engines in the subsequent stages of its development. It is therefore interested in building an ecosystem of partners and users to develop a broad range of high quality engines/language pairs.
Translation in the 21st Century is about sharing investments, technology, resources, about translation as a utility, about learning together to enable better translation and helping the world to communicate better.

Many businesses want to adopt MT, but face a seamingly impenetrable set of barriers when confronted with the cost of MT licenses, knowing which engines are available, understanding ease of customization, and working out how to measure ROI. The recent TAUS Executive Forum in Copenhagen helped shed light on how to breakthrough.

The great linguistic distance of Japanese from other major commercial languages has stimulated a great deal of MT-related activity in Japan but also continues to present challenges for translation quality. One could sense at the TAUS meeting a strong interest in Japan in the emerging new generation of MT technologies and that perhaps the impetus to experiment with and implement new solutions is starting to overtake the longstanding cautious skepticism towards MT that has prevailed in Japanese companies.

Lionbridge and IBM have announced an alliance to beat their competitors and improve top lines. This marks a new era for the translation industry.
Lionbridge has accumulated billions of translated words in the past ten to twenty years. IBM has developed a very powerful statistical machine translation engine. In fact they invented this technology back in the 1980s. IBM needs billions of high quality translated words to customize and train thousands of new machine translation engines. Selling translation technology is difficult in a world where the translate button is a freebie on every search page.
As budgets are squeezed, content volumes grow at a staggering rate, and high quality remains a touch stone for paid for translations, the industry is in need of new technological solutions to help translators and the wider industry rise to new heights.
Major translation buyers, language service providers and even commercial machine translation vendors are increasingly using open source MT solutions as part of their translation toolkit. By far the most wide used system is Moses, a statistical machine translation engine.
The European Commission (EC) has just announced a new round of language research & technology funding with a strong focus on the translation space. At the same time, a new raft of translation-related projects chosen from last year’s calls are about to get underway.
At a meeting held in Luxembourg on March 22 and 23, funding seekers and providers, project maestros and a number of European language technology stakeholders were given an overview of current research and tech development projects. Around €156 million will be made available between 2009 and 2013 when the current “framework” program for Information and Communication Technologies as a whole come to an end.
The recent mobilization of translation technology and human resources to help translate in post-quake Haiti is a timely reminder of the problem facing resource-scarce languages in an age of expanding translation automation.
When the web emerged as the natural platform for communicating, trading and storing data in the 1990s, many thought that the language playing field would finally be leveled: anyone could in principle acquire linguistic real estate on the web once a few coding standards had been agreed on. This would help sustain any of the 6,000 tongues extant on planet Earth today, whether they had eighty or 800 million speakers.
CLICK HERE TO SEE BENCHMARK DATA FOR POSTEDITING PRACTICES GLOBALLY
This comprehensive report is the result of 5 years of TAUS monitoring of advances, best practices, issues and emerging trends in postediting. It contains insight for every reader from those with an expert knowledge of the area to new entrants to postediting. From organizations with complex global operations to the freelance translator.
Training machine translation engines is a big topic lately. Everyone wants MT but the quality is generally not good enough for business use. So training and customization are crucial to the success of MT. In this article we share our perspective on the trends concerning the complexity of the process and the cleanness of the data.
We invite readers to support and join the growing community of new-generation MT developers. To let a thousand MT systems bloom. To help the world communicate better.
Report by Yi Fan He and Paraic Sheridan of CNGL
The 4th annual MT Marathon was held over five days during January in Dublin, hosted by the National Centre for Language Technology and the Centre for Next Generation Localisation (CNGL) as a partner in the EuroMatrixPlus consortium, which aims to provide a major boost to MT technology by applying the most advanced MT technologies systematically to all pairs of EU languages. Previous the MT Marathons had been held in Edinburgh (2007), Berlin/Wandlitz (2008) and Prague (2009).
Researchers, developers, students, and users of machine translation technology from all over the world attend lectures and labs introducing them over the course of days to the latest research in the field. More than 100 participants from 20 countries came to Dublin to join this year’s event.
Proposals were solicited for open-source MT projects on which developers and researchers could collaborate during the lab sessions of the Marathon. Over twenty open-source project ideas were submitted this year, of which seventeen received development support during the course of the Marathon.
SYSTRAN is the world’s longest serving commercial supplier of machine translation systems, with roots going back to the earliest days of MT research. The company recently announced a major new release (SYSTRAN Enterprise Server 7) that integrates statistical techniques into its rule-based core. This technology briefing looks at these latest innovations and how SYSTRAN intends to leverage them competitively.
TAUS members, download the report.
SYSTRAN recognizes that the growing availability of free online translation services and the deployment of services based on statistical machine translation has clearly increased competition and boosted market dynamics for MT. For the desktop segment, the challenge is to provide more value than simply “raw translation”. And in the corporate market, “free online translation services” do not really impact the need for customized translation, yet demand is growing, partly as a result of the higher visibility of MT. As a result, the enterprise segment is growing at a sustained pace.
A TAUS take on the non-proprietary MT landscape
Everyone knows that Moses is the most widely used open source MT system in the translation industry, but it is certainly not alone in the engine platform space. To ring in the New Year 2010, here is a rapid update of some of the better known open MT systems.
Looking into the future, I see a thousand MT systems blooming. I see fortune for the translation industry, and new solutions to overcome failed translations. I see a better world due to improved communications among the world’s seven billion citizens. And the reason why I am so optimistic is that the process of data effectiveness is joining hands with the trend towards profit of sharing.
The first is somewhat hidden from view in academic circles; the other leads a public life in the media and on the internet. One is simply science at work, steadily proving that numbers count and synergies work. The other is part of the ongoing battle between self-interest and the Zeitgeist. And the Zeitgeist is destined to win.

During the recent TAUS User Conference we explored the theme of the “The Profit of Sharing” as creatively as time and participants’ patience allowed. This included the tried and tested format of presentations imparting knowledge and the sharing of expertise by leaders in the field. The focus here being tackling the barriers and taking advantage of the main opportunities highlighted in the Innovation Roadmap report.
Once an MT solution is in place, the vital strategic need for any translation user is to access good data to train an SMT system, customize resources for a RBMT engine, or achieve advanced leveraging. Hence the recently launched TAUS Data Association (TDA) initiative for pooling and sharing very large translation memory resources in an industry cloud of authoritative content. The TAUS User Conference featured instructive examples of how data sharing can drive better translation automation.

Crowd/cloud/community sourcing in the translation industry is getting much eyeball attention, but there’s a lack of detailed insight into what works and what doesn’t. The TAUS User Conference helped share useful practices by fielding use cases and a powerful overview of community dynamics that set the agenda for this very topical phenomenon.
Customer support (CS) used to be all about running call centers. It then shifted to user searches of knowledge bases, with all the complexity this means for multilingual delivery pipelines. And now it has spread to sharing information on social media. Two years ago in Berlin, TAUS launched a conversation with the CS community by joining forces with the IT industry’s Consortium for Service innovation to explore a shared interest in localizing customer support. At the recent TAUS User Conference in Portland, attendees had an update on the CSI support model, and learnt from a successful user case deploying an MT solution.

On September 9 SWIFT hosted a Financial Translation Round Table meeting at its head offices in La Hulpe near Brussels. The initiative was taken by the TAUS Data Association with the aim to promote innovation and interoperability in financial translation.
At the MT Summit in Ottawa (August 28) Chris Wendt from Microsoft presented the findings from a recent pilot project using translation memories from more than ten TDA members to train the Microsoft statistical machine translation engine. The main tests were performed on Chinese and German language with customization done for Sybase iAnywhere. Additional tests were run on Polish and Japanese languages with customization for Adobe and Dell. Consistently the BLEU scores went up significantly with increases between 22% and 74% compared to engines trained on Microsoft or general available data only.
TDA members' pilot project prove the benefits of sharing translation memories

At a glance
"It's great for TDA and its members that we are already fulfilling the promise of catalyzing innovation for the industry. Sharing translation memories on the industry-owned TDA platform proves to be very beneficial in combination with scalable translation technologies."
Jaap van der Meer, director TAUS Data Association
A recent TAUS market study reports that 52 of 129 Language Service Providers (LSPs) are already using machine translation (MT) in their production environment and 86% of the remainder informed they plan to adopt MT within two years.
ProMT has recently been stepping up its marketing effort, technology resources and global visibility for its portfolio of translation tools. Grounded in a rule-based model, this strongly linguistics-driven, full-service solution is gaining customers, especially for the perceived quality of its Romance, Russian and Germanic language pairs (built on an advanced customization agenda for individual resources), and its high level of interoperability with typical user work environments, from desktops through to online services.
TAUS recently attended an inspirational event in Amsterdam organized by Aspiration Tech in partnership with Floss Manuals and Translate.org.za, and supported by Open Society Institute and Ford Foundation.
What you don't (want to) knowThe first responses to the Google Translation Toolkit are generally positive. No wonder: it makes total sense. It offers a friendly interface for translators to edit translations, do peer review, leverage from previous translations and make use of an ever better MT engine that is fully integrated. You can even upload your own TMs and glossaries and have them hosted by Google. And, most importantly, all of this costs you nothing.
As part of our community translation research program, we talked to dotSUB's CEO Michael Smolens about community translation for TED.
TED Open Translation Project launched on May 13th, bringing TEDTalks beyond the English-speaking world, by offering subtitles, time-coded transcripts, and the ability for volunteers to translate any talk into any language. A year in the making, two weeks after launch, more than 512 volunteer translators have already contributed, resulting in 371 videos completed with subtitled translations, 762 more in process, in 64 languages. The project is generously sponsored by Nokia. dotSUB provides the technological backbone and project management architecture behind the initiative.
A benchmark from University of Leeds
TDA has undertaken an assessment of the effects on leveraging translated data when larger sets of translation memories from different company sources are shared. The assessment was undertaken by the Centre for Translation Studies of the University of Leeds.

The TAUS Data Association (TDA) language data exchange portal has been released for public use. Members can store and share their translation memories and terminology in a secure central database. Users around the world have free access to the data to search translations. Members can leverage the scalable shared repository of language data to give a boost to their volumes and productivity of translation activity.
This complimentary report comes at a critical moment as two prerequisites for accelerated innovation and greater interoperability, industry-wide language data sharing and open translation platforms, have become reality. This report outlines the industry's development focus in the coming years, highlighting the types of decisions companies are making, spotlighting a few case histories and predicting how events will unfold.
Not so coincidentally an article with this telling title landed in my inbox exactly on the same day that we announced release 1 of the TDA language data exchange portal. "The unreasonable effectiveness of data" is an article written by three researchers at Google* and published by the IEEE Computer Society in their March/April journal. Why is it, so ask the authors, that physics can be so neatly explained with simple mathematical formulas, while economics fail to model human behavior and grammar is suffocated by hundreds of rules and just as many exceptions
According to the results of the March 2009 TAUS market survey, only 14% of the LSP (language service provider) respondents state that they will never use MT, 40% already use it today. This provides strong evidence that MT is moving into the mainstream among LSPS. To launch a series of reports on LSP deployment, this report focuses on the two major LSPs which own and develop their own MT systems.
The TAUS Executive Forum in Edinburgh (March 25-27) was another milestone event. "One of the best conferences I have attended in my 15 years in business..," commented one of the attendees. The two days of meetings were attended by 50 decision makers representing the buy and the sell side of the market in equal proportions. On the agenda was the design of the localization future. The program was carefully prepared by a committee with representatives from some of the leading IT companies. They had called for proposals for truly Open Translation Platforms. Out of 15 companies submitting proposals 8 were invited to present in Edinburgh. What made this TAUS Executive Forum so special was the genuine openness to change. "Innovation in the localization industry started right here in Edinburgh."
That is the short and easy answer the SMT developer will give, when you ask: "what can we do to improve the quality of the machine translation engine?"
But things are not always that easy in the world of Statistical Machine Translation. Even the insiders are sometimes puzzled by the effects of data training on the SMT engines. It's time to bring some clarity into this obscure and complex area of the translation industry. According to the current TAUS market survey more than 50% of the respondents expect to be using machine translation in their translation operation within the next two years. More than 50% of the respondents also expect to share language data with industry partners in order to build large enough data sets to train MT engines more effectively.
Microsoft's commitment to machine translation has been large-scale and sustained. The company has been involved in Natural Language Processing (NLP) research and development since 1991, specifically in Machine Translation (MT) research since 1999, and has used MT in a production environment for just over 5 years. It has pursued a strategy of pragmatic, step by step advance over the long term, building on strong in-house research in natural language processing in general, in parallel with its development work on product localization and other applications.
This technical guide is intended for anyone faced with preparing translation training data for statistical machine translation. It examines data preparation processes which are the catalysts that enable data and algorithms to work in unison. It explores how to define an organization's training data strategy to match overall system design, identifies potential data sources, introduces the challenges of merging multiple corpora to create large data sets and explores several methods to prepare these translation memories into SMT training data. Finally, it looks into the speech roots of SMT and introduces the concept of exception management as a context for preparing SMT training data.
Question: What keeps us from growing and makes us vulnerable in times of economic downturn?
Answer: Low IQ!
This is equally true for a whole industry as it is for individual players. Protectionist behavior and lack of interoperability kills innovation and frustrates customers. TAUS has been fighting for industry collaboration and open sharing in the translation industry for several years now. So it was very gratifying to read the article by Michael Schrage in The Financial Times (February 5) about the interoperability quotient (IQ) as a measurement for innovation. How closely this applies to the translation industry!
On February 2nd, Google released another seven languages on its Translate service, including Hungarian. Does this spell bad news for small companies such as MorphoLogic, Hungary's major language technology developer and operator of the Webfordítás web translation service? CEO Gábor Prószéky and László Tihanyi, Director of the Translation Business Unit, explain the strategies available to "independent" MT providers at a time when money is scarce.
In 2000, Symantec, the global leader in security, storage and systems management solutions for business and consumer information, decided to reorganize its global communication and publishing strategy. With rapid global expansion and product localization in over 40 languages, the company was finding it difficult to maintain brand consistency across messages published in different geographies. The global marketing team therefore decided to centralize control over all publishing processes.
In mid January, the European Commission (represented by Language Technology maestro Roberto Cencioni) brought together 250 of Europe's brightest and best translation players - academic researchers, technology providers, and LSPs - in Luxembourg to offer them some badly needed innovation funding. There was not a lot in the pot - €40 million to collaborate on new ways of making better systems - but for the first time in several years, the machine translation agenda was specifically targeted for development. Why now?
For Hans Uszkoreit, the doyen of MT researchers in Europe, one answer lies in the Gartner Hype Cycle that tracks technology visibility along a complex curve. Professor Uszkoreit is the current coordinator of the EuroMatrix project that is attempting to weave extensive collaboration around building statistical MT engines for all "official" EU languages (506 pairs), and gave a stimulating keynote at the Luxembourg meeting.

Nearly a decade ago, Symantec took a number of key decisions about how to manage its Web site translation processes when its previously-distributed localization activity was centralized. At the time, there were no exemplars of how to do this, so the team had to invent its own approach. This marked the start of a process that has led to one of the most efficient workflows for multi-stream localization. The Symantec localization hub offers services to other in-house customers, ranging from customer service through the legal department to marketing and more. Fred Hollowood (see picture) has been leading Symantec's translation automation activities.
During 2008, Microsoft went live with its own general purpose machine translation system with a dozen language pairs. This solution is gradually being integrated into the Office environment, and may revolutionize desktop access to multilingual information for the company's huge customer base around the world. This technology rollout marks the culmination of two decades of R&D into natural language processing and MT which has rarely had much public exposure. TAUS has therefore completed a report on Microsoft's machine translation activity and it will be published for TAUS members in early 2009.
Danish LSP Inter-Set was involved in a unique post-editing experiment reported at this year's European MT Summit. Lisbeth Kjeldgaard Almsten (language technology) and Joan Kiehl (QA) talk about its implications for the company.
As part of its community translation research program, TAUS talked to Linden Lab's Localization Director Danica Brinton. Her first five months among the 40 million residents of the online virtual world have been an "eye-opener" on translation crowd-sourcing.
For Daniel Grasmick and Christopher Hearn, who head up Lucy Software, there are new markets emerging in large-deployment MT solutions.
As statistical MT tends to grab the headlines, how can legacy rules-based MT systems get their message over to potential customers?
This update on post-editing focuses on three evolving aspects of post-editing practice: the impact of statistical machine translation on the post-editing cycle, problems in specifying target quality for post-edited texts, and efforts to improve guidelines for human post-editing practices. Now that end users can in certain contexts choose between statistical and rule-based MT engines, the post-editing stage may become a selection criterion. Before undertaking full deployment, LSPs and end users will need to run quantifiable evaluation pilots to inventory post-editing tasks, identify recurrent modifications for automatic solutions, and base expectations about quality and pricing on objective data.
Can we design better software support for MT in general by wiring up post-editors in a lab and analyzing their behavior? Language technology researcher and founder of the Bioloom Group, Jürg Schütz suggests why we should.
The Danish Languagelens System is a statistical MT engine that began as an academic project two years ago and now drives millions of words of English to Danish patent translation at the Copenhagen based LSP Lingtech. Theoretical linguist Daniel Hardt now supervises development at Language Lens.
Language technology consultant Tom Hoar has recently made his corpus processing software available at Sourceforge. Hopefully this is the first of many new offerings in the statistical space to bring down SMT overheads.
Delivering multilingual support to customers or across support supply chains between operators still involves strategic choices. SpeakLike offers a semi-automated solution for short form communications.
Do you set up costly local contact centers in your geographical markets to handle the language? Or do you depend on the availability of language specialists in one centralized organization? And how about the general shift to online support, where there is a rapid increase in chat, email and other ‘short-form' communication methods for CS and supply chain management?

Go to www.tausdata.org and test the free Language Search Engine. From an idea born in Taos in March 2007 to Release 1 of the TDA portal this month... it's only two years. Thanks to the support of the 45 founding members, we have managed to unleash a wave of change. TDA is the catalyst for innovation and automation in the global translation industry.
Google's imminent launch of its free-for-all Translation Centre raises some questions. Why is Google doing this? How can it be free? Should I start using it? What are the alternatives? What's next? This article brings some answers and serves as a springboard for a discussion.
TAUS asked Lingotek's Robert Vandenberg, Lingotek's Vice President of Sales and Marketing about In-Q-Tel's announced investment in the Utah-based translation automation company. Why Lingotek and why now?
Yan Yu reviews the first version of the TinyTM translation memory solution for TAUS' ongoing technology watch.
Open source is increasingly getting more traction in the localization industry. Since the spring of 2007, Forum Open Language Tools (FOLT) has been driving the Translation Memory Open Source System (TMOSS) initiative.
In April 2008, Frank Bergmann, founder of ]project-open[, released the developer version (V0.1) of an open source tool called TinyTM. In July 2008, Welocalize announced the GlobalSight Open Source initiative.
The potential attraction of open source technology for the translation industry is that the source code is not owned by any LSP : it is free to use and to distribute, thereby democratizing the benefits of TM. It also reduces the difficulty and cost associated with customizing and with integrating third party software.
What central problem does your book, The Global English Style Guide, address?
The need to communicate clearly to a global audience-an audience that includes non-native speakers of English, translators, and perhaps also machine-translation software, as well as native speakers. The Global English guidelines are based on empirical research, and the book provides much more detailed explanations of these guidelines than can be found in any other single source.
Post-editing is key to successful MT deployment. This report looks at the background conditions and current practice of post-editing machine output in organizations such as the PAHO and companies such as Symantec, with a focus on emerging tools and quality metric issues. 
TAUS and the Consortium for Service Innovation co-hosted the first joint Global Support Summit in Berlin, bringing together players working in customer support and the TAUS community of translation buyers, technology providers and translation and localization practitioners.

Statistical machine translation (SMT) researchers in Europe will be having a field day in May, with the 2nd Machine Translation Marathon in Berlin lasting ten whole days (May 10 to 20). The event is being organized by EuroMatrix, a public-funded research network project devoted to multiplying SMT development and evaluation for various pairs of European languages. At the end of the event, an invitation-only 2-day Translingual Europe conference is due to inform industry and commerce among other audiences about the "opportunities and challenges" for EU research in MT.

Amsterdam, June 30, 2008: Forty organizations active in buying and supplying translation services and technologies have jointly established a new industry association aimed at sharing parallel language data with the objective to stimulate innovation and automation of translation activities. The TAUS Data Association (TDA) will host translation memories and glossaries in all languages structured by industry domains and company indexes. TDA will give free access to its databases for the look-up of translations of terms and phrases. Members will be able to select and pool data to increase translation efficiency and improve translation quality.
The localization industry is often proud of its role as the critical facilitator for enterprises and organizations wishing to "go global". Yet it has been surprisingly slow to accept the full strategic consequences of embracing innovation. Innovation in the localization space has often been limited to incremental steps: cost reductions and service upgrades. But the innovation really needed in the industry goes much deeper. It is very hard to do, especially if the image of who our customers really are starts to blur. The urge to change goes beyond the traditional client-supplier relationship and involves a global audience of end-users, citizens, patients and tax payers.
In our series of portraits and perspectives on Localization Business Innovation, Francis Tsang, director of globalization at Adobe Systems Inc., explains the opportunities opened by using 'crowdsourcing' as a localization solution, and the language data sharing initiative as a key force for supporting the next level of globalization
Alain Désilets is a Research Officer at the Institute for Information Technology of the National Research Council of Canada (NRC). Part of his job is to think about the future of translation technology
This report gives an overview of Language Weaver, the supplier of statistical machine translation technology solutions to the enterprise market. It provides background on the company, a brief account of its technology and solutions, and a set of user cases of companies that are deploying Language Weaver's products and services. This report aims to give an objective account of the key features of one of the more visible players in the language automation market today.
Imagine a model of simship localization whereby the translation actually follows the source text creation and life cycle rather than necessarily coming at the end! Heresy! Impossible! Well, not for Autodesk. Led by Senior Application Programmer / Analyst Mathieu Cresp, the design software company is testing a new process called Continuous Workflow, which aims to remove many of the old localization bottlenecks and even shift the cost model for translation from words to time spent.
Language Weaver now has Mark Tapling as its CEO, with a strong vision of where to take the company. Meanwhile Asia Online in Bangkok, helmed by Dion Wiggins, is stepping up to the starting line in the SMT stakes. Potential users of the technology will now have even more questions to ask about the real benefits and risks of the fast-growing translation industry.
The TAUS Summit III held in Cambridge (Boston) on March 19 through March 21, 2008, focused on two themes - Localization Business Innovation and the TAUS Data Association. The sessions on Language Business Innovation examined buyers' needs for independence and flexibility in working with different language service providers (LSPs) and community-engaged translation (called 'crowdsourcing'). The discussions on the TAUS Data Association covered its business plan, its legal structure, and the next steps. The TAUS Data Association Business Plan and a draft of the Data Provider & Pooling Agreement, Data Donation Agreement, and User Conditions were available to all participants prior to the summit.
If a $22 million transaction can shake up a $10 billion industry even a little bit, you wonder what state this industry must be in. The acquisition of Idiom by SDL stirred up discussions of awe and despair. Many people loved Idiom for what it was: independent, neutral, the Switzerland of the translation industry. WorldServer was the technology that would make translation easy and attractive for every user. But let's face it, the company was performing poorly, and had lost too much money over the years. The investors had lost their appetites. Even though WorldServer could automate, let's say, 20 of the 40 steps in the process, it was not the sort of magic solution that excited a global executive. End of story....
What would Language Weaver say about a new SMT engine delivering 440 language pairs, due to be rolled out in the next 18 months from an office in Bangkok?
This is the highly serious business plan announced by Dion Wiggins, CEO of Asia Online at the LISA Conference in Beijing this week. Currently in the process of launching a portal focussed on Thai language content, Asia Online aims to become the master content owner of South East Asian language content, and purveyor of the largest SMT engine project the world has yet known.
At the Idiom User Conference last November in Barcelona Jessica Roland, Director of International Product Operations at EMC, referred to Idiom as the "Switzerland of the localization industry". That is history now. And yet, the separation of Infra from Lingua is a key condition for innovation in the localization industry. In our series of portraits and perspectives on Localization Business Innovation Jessica Roland pleads for independent localization technology platforms as a condition for translation automation and innovation
The third TAUS Executive Forum held in Brussels on November 29 and 30, 2007 welcomed thirty senior localization, translation and information industry professionals from the United States, Asia and Europe. This Forum was devoted to the critical topic of innovating the translation business model. Stimulated by contributions from among others Autodesk, CNH, Medtronic, Xerox critiquing the relevance of current business models for continuous publishing, user-generated content and customer support, the Forum broke out into groups to analyze the pros and cons of word-based translation pricing, and propose alternative models. Consensus merged around the need for an incremental shift towards managed services or trusting relationships between vendors and buyers conditioned by clear service level agreements. A number of participants, however, wondered aloud whether the current vendor model would survive at all.
In a presentation at the November 2007 TAUS Executive Forum, Gilles Martel, Director of Resources Management, Corporate Services, for the Translation Bureau of the Canadian Government, gave a sneak preview of the new vision for online translation services currently in the works for the Canadian Government. The blueprint centers around a fully-integrated infrastructure platform that will provide a palette of "pull" services for different kinds of users.
The New Generation of TMs
While machine translation (MT) has a longer history, translation memory (TM) has had a wider adoption by corporations, government agencies, and translation houses. The last ten years saw the development of new translation features that build upon and extend the capabilities of classic TMs by identifying sub-sentence repetitions. We recognize this as a new generation of translation technology and call it Advanced Leveraging (AL). This report examines shortcomings of classic TMs and introduces Advanced Leveraging and why it fits wihtin the new market environment. The report provides examples of Advance Leveraging, a survey of several vendors offering Advanced Leveraging products and services, and case studies.
It's not MT, and it's not TM, but it certainly helps accelerate translation work. What is it? A new generation of translation tools that builds on older principles of Example Based and Statistical MT and resolves deficiencies in classic TM. We call it ‘Advanced Leveraging'. It combines statistical analysis and linguistic intelligence tools to create a new category of fuzzy matches that can lead to significant increases in translation productivity.
The TAUS Summit II held in Belfast on October 4 and 5, 2007 was dedicated to examining the kinds of legal structure and business model that are best suited to the construction of a language data sharing co-operative. The orginal philosophy and vision underlying the TAUS Co-operative project can be found in part 1 of the comprehensive report of Summit I, and in the successive versions of the TAUS Prospectus, both available for Summit attendees and TAUS subscribers
The second TAUS Executive Forum held in San Francisco welcomed 36 translation and information industry professionals from the United States and Europe. This Forum was devoted to the topic of translation automation in customer support, with contributions from some of the largest IT industry players who are now successfully moving part of their self-service customer support onto the web, using automated translation solutions.
Controlling and managing the source in any localization project is a key issue in translation automation. This report looks at the various methods and tools developed over the past decade, drawing on a recent "authoring wish-list" survey, and then examines in detail a number of projects in organizations such as Symantec, Sun and SAS who are piloting managed authoring environments.
This starter's guide, based on current user cases of MT usage, is designed to introduce newcomers to the technology basics, and some deployment best practices. It draws on information shared in TAUS Executive Forums over the past two years, offering an overview of translation automation at work.

This interim report on best practices reviews existing user cases and highlights the user experience and quality and usability measurement. Traditional localization models are converging with user-driven automated translation. The report highlights user cases at European Patent Office, Symantec, Microsoft and Cisco.
Who talks about simship these days? "Simship" was the hot item at localization industry conferences some ten years ago. Being able to ship software products in all language versions around the world on the same day was the ultimate goal for every self-respecting localization manager. Who talks about simship today? And who does it?
The term was put out there for translators to start worrying about what their customers really wanted. A clear vision, logical and understandable: delays in the shipment of localized products cost sales. But how can it be done, if - just as logical - the translation can not start until the English source is ready?
The Oracle Worldwide Product Translation Group (WPTG) has developed a system that allows Oracle developers to send new applications for translation and receive them back within only a few hours with more than ninety percent of the text translated into 30 languages. This report provides an in-depth analysis of Oracle's "Translation Factory" and how it is using Advanced Leveraging to boost productivity.
In this report TAUS takes a close look at the market forces that impact our industry (such as Web 2.0 collective intelligence), and at technologies (such as STM and language search) that are disruptive to the way translations have been produced until now. From the resulting turmoil, we sketch a roadmap towards embracing these changes and prospering from them. Think of it as a roadmap for managing changes and contributing to global business infrastructures that could deliver the same sort of success as automatic currency exchange in the banking industry.
This was TAUS' first Asian meeting, and covered in addition to convincing case studies from attendee translation automation solutions, included presentations from China's own machine translation community.
A report on current practices.Measuring MT quality effectively is vital to take-up for most prospective users. This report looks at the growth in automatic metrics used in R&D, and the emergence of quality standards in the translation industry. It looks at formal and linguistic dilmensions of QA, and then focuses on the emerging need for "user-defined" quality in applications such as customer support, where speed and understandability are acceptable criteria.
Barcelona - June 26, 2006Held at the Localization World Conference in Barcelona, this full-day meeting on enterprise approaches to MT was attended by 14 participants from a balanced mix of practitioners and corporate users.
Barcelona - June 2006This report covers the collective experience of a versatile group of 33 industry participants, and offers valuable guidance for everyone who is planning a translation workflow implementation or who is reconsidering their existing approach.
With a good sense of history, the TAUS Executive Forum was held in the Key Bridge Marriott Hotel in Washington DC with a clear view of Georgetown University on the other side of the Potomac river, where just over fifty years ago the first MT experiment was performed on an IBM mainframe computer. The almost perfect translation of a Russian text into English convinced one of the project leaders that ‘all of the Soviet Union could easily be translated in a couple of hours’ and that ‘human translators would no longer be needed in a period of five years’. We opened the Forum meeting with a video of this news report from 1954, which Steve Richardson from Microsoft had kindly made available to us.
Different Approaches to Machine TranslationThis comprehensive overview for business users looks at the different types of MT technology, including rules-based, statistical and hybrid solutions using translation memories. and explains the quality issues and typical usage contexts.
October 2005At this very first roundtable under the TAUS banner, participants from user, practitioner and technology provider sectors discussed a wide range of industry issues, including the barriers to and benefits from more comprehensive translation automation.
"If anything, it is the lack of control and visibility", says Arnaud Daix, Director of Localization at Hewlett Packard ACG. "We outsource our localization activities to multiple vendors and we get translations back. But we don't know whether we have an optimal process. It is a challenge to measure efficiency and predict the quality levels. This lack of control is unsettling especially when our release cycles are shortening, our number of target languages is increasing and our volumes are growing. This is why we are investing in technology like XML and globalization management software across the content management chain: to better control our processes and deliver more value to our customers."
Surprised by the recent shake-up in the translation industry? The SDL acquisition of TRADOS and the Lionbridge acquisition of Bowne Global Solutions. No, it was about time something big happened. Wasn't it? The previous round of mergers and acquisitions did not change very much, except for the cosmetics of creating some fatter players. The same old pre-occupation with translation being an artistic craftsmanship dominated the heart and soul of every language service provider, large and small. That paradigm is now broken down very fast. The industry leaders have spoken.
Although there has been a parallel tradition of academic research in computational linguistics (using computers to analyze and explore various aspects of human languages), the real engine driving the growth of language technology has been the search for economic or geostrategic performance. Those earliest attempts at translation technology in the 1950s focused on automating the translation of Russian to English technical material related to the space race and its military implications. Later in the 1970s during the Vietnam War, there was a spurt of activity to develop an English to Vietnamese system to speed up the translation of weapons documentation. By the 1980s when the consumer market for computing started to open up, the interest switched to translating or adapting actual software products and their accompanying volumes of documentation into any language that offered a suitable market - a process we now call localization. We were on the cusp of an age of mass multilingual computing.
Why are more and more companies using Machine Translation software? Certainly because the technology has much improved in recent years. But mainly because the need has become paramount. Fully automatic real-time translation proves most valuable when traditional human translation is not an option:
Integration in search, content management and customer relationship management is straightforward. Customized dictionaries dramatically improve the utility of Machine Translation (MT). In these and many other cases, it is a question of MT or no translation at all.
The foundation for TAUS (Translation Automation User Society) was laid in a Round Table meeting on November 18, 2004 at the Localization World Conference in San Francisco. Delegates from fifteen companies - all corporate users of translation services and technologies - joined for a full day brainstorming on issues around translation automation. The day turned into a great success. The open discussions and the sharing of experiences and insights helped the participants in the meeting to be better informed and prepared for their own implementation of authoring, translation and globalization technologies.
San Francisco, November 2004This historic first Translation Automation Roundtable launched the platform for an open exchange of experiences and insights in the issues around Translation Automation among corporate users, and resulted in the creation of TAUS.