Not just anonymous words indiscriminately scraped from the worldwide web, but good quality human translations from trusted sources, from government bodies and institutions, from companies large and small and from professional translators.
What could we do with it? We could transform the translation industry! Here is how we do it:
Terminology mining and dictionary building
Today glossaries are built by terminologists: the best in class language specialists. It is laborious work and frustrating. Because language keeps changing, the terminologist is always behind and the glossary is often ignored.
Imagine we have 100 billion translated words at our disposal. Terminology is harvested real-time. Synonyms and related terms are identified automatically. Part-of-speech is tagged, context is listed, sources are quoted, meanings are described. It is not rocket science. In fact all the tools exist to do this and do this well.
Customize automated translation
Today we use MT on the internet and accept its stupid failures due to lack of domain knowledge of the engines. Some of us go through the lengthy and costly process of customizing an engine for our company’s use.
Imagine we have 100 billion translated words at our disposal. We will do fully automatic semantic clustering to find the translations that match our own domain. We will do automatic genre identification to make sure that we use the right style. We will go deeper in advancing MT technology with syntax and concept descriptions.
Global market and customer analytics
Today translation is an isolated function, a cost center in most companies and organizations. We push translations out but we have no means to listen, learn and connect with our customers worldwide.
Imagine we have 100 billion translated words at our disposal. We will integrate our translation process and skills with text analytics and social media management. We will do multilingual sentiment analysis, search engine optimization, opinion mining, customer engagement, competitor analysis, and more. From a cost center the translation function would become strategic multiplier for global organizations.
Quality management
Today we struggle to deliver adequate quality in translations. We miss the local flavor, the right term or the subject knowledge. The source texts may be in bad shape, causing all kinds of trouble for the translator or the MT engine.
Imagine we have 100 billion translated words at our disposal. We will automatically clean and improve source texts for translation. We will run automatic scoring and benchmarks on quality. We will improve consistency and comprehensibility.
Interoperability
Today the lack of interoperability and compliance with standards costs a fortune. Buyers and providers of translation often lose 10% to 40% of their budgets or revenues because language resources are not stored in compatible standard formats.
Imagine we have 100 billion translated words at our disposal. Imagine that it’s common practice in the global translation industry to share most of your public translation data in a common industry repository. Very naturally then all vendors and translation tools are driven towards hundred percent compatibility. Jobs and resources will travel without any loss of value. Benefits on an industry scale add up to billions of dollars and euros.
Stakes are high. Risks are low. Only fear can stop us.
A quarter of a million professional translators produce 625 million good quality translations every day, some 150 billion a year, of which an estimated 70% is published on the internet. We can collect and share 100 billion translated words every year. We should give unlimited access to this gigantic supercloud of translations. The translation supercloud must be not-for-profit and directed by the data contributors.
The stakes are high. The translation industry will flourish. The world will communicate better across all language barriers.
The risks are low. It is just a choice to participate proactively, or leave it to the ‘pirates’ to change the industry.
Only fear can stop us. Fear of change and fear of losing control will be replaced by fear of being left behind. Since July 2008, many industry leaders have already been sharing their translations in the TAUS Data Association repository.






Comments
RSS feed for comments to this post