TAUS - Enabling better automated translation

Friday
Jul 30th
Text size
  • Increase font size
  • Default font size
  • Decrease font size

Technical guide to SMT Training Data

This technical guide is intended for anyone faced with preparing translation training data for statistical machine translation. It examines data preparation processes which are the catalysts that enable data and algorithms to work in unison. It explores how to define an organization's training data strategy to match overall system design, identifies potential data sources, introduces the challenges of merging multiple corpora to create large data sets and explores several methods to prepare these translation memories into SMT training data. Finally, it looks into the speech roots of SMT and introduces the concept of exception management as a context for preparing SMT training data.


Full report exclusively available to TAUS members >> Technical Guide to SMT Training Data


Become a TAUS Member


ARTICLES ON OTHER REPORTS IN 2009

LSPs in the MT Loop: Current Practices, Future Requirements (July)
ProMT: Machine Translation Solutions Tailored to All Types of User Environment (July)
TAUS Innovation Roadmap (June) - Complimentary report
Owning MT. Lionbridge and SDL as MT users (April)
Microsoft's Machine Translation (March)
Symantec's Localization Revolution (January)

 

JOIN OUR MAILING LIST

Reports

 

Postediting in Practice

 

Implement Open Source MT

 

Increase Your Leveraging

Members

 

Pangeanic, Manuel Herranz

A mechanical engineer at a quality assurance depar...

 

McAfee, Paul Walsh

Paul Walsh is the Vice President for Localization ...

 

SDL, Jeremy Harpham

Jeremy Harpham is Senior Product Marketing Manager...