TAUS - Translation Automation

Wednesday
Mar 10th
Text size
  • Increase font size
  • Default font size
  • Decrease font size

Community Translation for TED.com

Michael SmolensAs part of our community translation research program, we talked to dotSUB's CEO Michael Smolens about community translation for TED.

TED Open Translation Project launched on May 13th, bringing TEDTalks beyond the English-speaking world, by offering subtitles, time-coded transcripts, and the ability for volunteers to translate any talk into any language.  A year in the making, two weeks after launch, more than 512 volunteer translators have already contributed, resulting in 371 videos completed with subtitled translations, 762 more in process, in 64 languages.  The project is generously sponsored by Nokia. dotSUB provides the technological backbone and project management architecture behind the initiative.

Why is this project so special?

"For a couple of reasons -

Firstly, TED has developed a large media business with TEDTalks, which is very well known and respected for the highest quality content, and has a large technically savvy global audience wanting to help fulfill TED's mission - "Ideas Worth Spreading." TED always takes risks and is a perpetual leader in introducing new ideas because it does not have any ROI needs.  So when TED's Open Translation Project announces it trusts the crowd, and is willing to put its brand behind it, this creates an opportunity to prove that the open community, using the right technology, can dramatically increase the audience for any video content.

Secondly, with video on the web emerging as a mainstream tool for storytelling, a new market for subtitled content is opening up. This market does not have clear ROI calculations and so it's difficult for many managers to justify the expenditure needed to make video available in new languages. Up to now our enterprise clients, for whom we've often used professional translators, have been big companies with small programs, smaller companies, non-profits and start-ups. We believe the exposure provided by the TED initiative, such as last Friday's Newsweek story, helps us and others like us who have taken the risk to invest in a new area."

How does the process work?

"TEDTalks in their source language English already enjoy more than 11m monthly views.  From TED's homepage, anyone interested in translating has an easy sign up on both TED and dotSUB, then is shown a dashboard of over 400 possible language choices, when one is chosen it leads to thumbnails of all 450 videos available for subtitling.  By selecting their video of interest, an email is automatically sent to the TED project manager, which triggers a questionnaire to be sent to the translator requesting background info.  The project manager then gives those who have appropriate fluency and intentions permission to subtitle that video in that language.  They have 30 days to complete the subtitling.  Once they hit the ‘complete' button, that video in that language then becomes available for a 2nd native speaker of that language to select that video for review.  During the review process, only the initial translator and reviewer have access to that video in that language, and they have each others email addresses and are encouraged to work together.

Once the reviewer says the review is ‘complete', an email is automatically sent to the project manager, who then has the ability to hit a ‘publish' button, opening the API to that video in that language for viewing on the TED site.  Until the ‘publish' button is pushed, all translation and review are done in the dotSUB environment, with the video being freely available for viewing, but not translation without permission.  Once published, both the translator and reviewer are credited on the video page on the TED.com site for that language, and their email addresses are available for anyone viewing the video to send them questions or comments.  TED makes it clear in its directions that it welcomes open dialogue about the quality of any translation, and specifically asks if someone sees a video which looks like it is done without the best of intentions to immediately contact TED.

Two weeks after launch, 408 translators are simultaneously working on 762 videos in 64 languages, and the process is only beginning.  All of this work is being managed by an enterprise class project management system and database developed by dotSUB to handle all character sets and fonts, as well as libraries of thousands of videos using hundreds of translators into scores of languages."

What sorts of challenges do you foresee?

"Having spent a year working on the project, with the normal crises, changes, deadlines, mistakes, revisions, there was a bit of nervousness prior to site launch, with the uncertainty of how our system would hold up, in spite of a well designed beta period.  Thus far, two weeks post launch, there have been no glitches.  It is funny, going forward, I really do not see any major technical challenges, but only opportunities for continued learning and finally, an unrefutable proof of concept.

TED, with its powerful content, well known global brand, and tens of thousands of passionate fans, will have thousands of crowd sourced subtitled videos within a few months.  Once the global community in general becomes aware of this availability, and the blogosphere spreads it virally, there will be continual feedback as to how to improve, things, and TED & dotSUB are committed to making changes and refinements with the ultimate goal of spreading TED's message globally.

The real challenge will then be to take this learning and convince traditional owners of video content of any type, in any industry, that crowd sourcing can work, can help them spread their message, engage their audience, lower their costs, while at the same time embracing openness instead of feeling threatened by new standards of quality."

What role could there be for machine translation?

"dotSUB has fully integrated Google translate in all 41 languages into our backend, so with one click, once a video has an initial time coded transcript in English (or other Google source language), we can create a fully subtitled video in another language.  However, the quality is very inconsistent depending on subject matter, as most videos are not highly technical in nature, but much more conversational and idiomatic.

We have not yet done a full blown test to see the time difference between subtitling from scratch as opposed to human editing of a machine translation, but we are totally open to it and are in discussions with Meedan and Asia Online, among others.  If we can save time, we will do it.  However, we have done full tests on time coded captioning starting with voice to text conversion engines versus human captioning/transcription, and found that it actually takes longer to correct errors from machine conversion, as our captionists are typing about 200 words/minute with headsets, and if they have to continually stop, highlight, delete, move, edit, they never can get into their rhythm.

We are currently primarily focusing on enterprise clients, and have plans in the near term to offer a consumer/SME/publisher facing model, at different levels of quality and price points.  It is in this effort that we plan to fully use machine translation integrated as video subtitles as part of a freemium offering."

 

 

Events

 

Focus on Asia - Localization Business Innovation

TAUS Executive ForumTOKYO, JAPAN APRIL 14-16, 2010 TAUS Executive Forums are non-spo...

 

EVENT REPORTS - TAUS User Conference 2009

TAUS User Conference 2009 - Events Reports Portland (OR), USA 27-30 October The TAUS U...

 

Localization Business Innovation

TAUS Executive ForumCOPENHAGEN MAY 19-21, 2010 TAUS Executive Forums are non-sponsored ...

Members

 

Janus Worldwide, Konstantin Josseliani

After thorough acquaintance with TAUS' activities,...

 

Symantec, Fred Hollowood

"TAUS acts as a champion in the field of language ...

 

MultiCorpora, Pierre Blais

“As a technology provider, we are evolving rapid...