This technical guide is intended for anyone faced with preparing translation training data for statistical machine translation. It examines data preparation processes which are the catalysts that enable data and algorithms to work in unison. It explores how to define an organization's training data strategy to match overall system design, identifies potential data sources, introduces the challenges of merging multiple corpora to create large data sets and explores several methods to prepare these translation memories into SMT training data. Finally, it looks into the speech roots of SMT and introduces the concept of exception management as a context for preparing SMT training data.
Full report exclusively available to TAUS members >> Technical Guide to SMT Training Data
ARTICLES ON OTHER REPORTS IN 2009
LSPs in the MT Loop: Current Practices, Future Requirements (July)
ProMT: Machine Translation Solutions Tailored to All Types of User Environment (July)
TAUS Innovation Roadmap (June) - Complimentary report
Owning MT. Lionbridge and SDL as MT users (April)
Microsoft's Machine Translation (March)
Symantec's Localization Revolution (January)


