TAUS - Enabling better translation

Saturday
Feb 04th
Text size
  • Increase font size
  • Default font size
  • Decrease font size
Home > Publications > Reports > Best practices > Technical guide to SMT Training Data

Technical guide to SMT Training Data

This technical guide is intended for anyone faced with preparing translation training data for statistical machine translation. It examines data preparation processes which are the catalysts that enable data and algorithms to work in unison. It explores how to define an organization's training data strategy to match overall system design, identifies potential data sources, introduces the challenges of merging multiple corpora to create large data sets and explores several methods to prepare these translation memories into SMT training data. Finally, it looks into the speech roots of SMT and introduces the concept of exception management as a context for preparing SMT training data.


Full report exclusively available to TAUS members >> Technical Guide to SMT Training Data


Become a TAUS Member


ARTICLES ON OTHER REPORTS IN 2009

LSPs in the MT Loop: Current Practices, Future Requirements (July)
ProMT: Machine Translation Solutions Tailored to All Types of User Environment (July)
TAUS Innovation Roadmap (June) - Complimentary report
Owning MT. Lionbridge and SDL as MT users (April)
Microsoft's Machine Translation (March)
Symantec's Localization Revolution (January)

 

Add comment


Security code
Refresh

SUBSCRIBE TO OUR FREE NEWSLETTERS AND ALERTS

Learn about the best translation technologies, open platforms and interoperability, the possibilities of machine translation. Subscribe to our alerts and keep up to date with the latest events, articles and reports.

JOIN OUR MAILING LIST

OTHER TAUS SITES

TRANSLATION AUTOMATION TIMELINE

At TAUS we're forward-thinking. Which means we try to know our history. So explore with us the story of translation automation in the digital age. See timeline

RECENT VIDEOS

The content value chain
A use case to share knowledge on enabling a global content value chain with an emphasis on the integration of automated translation technology. Challenges and opportunities of working with small languages and under resourced domains with a focus on Baltic Languages. Results of a study for the SAS Institute on the impact of Global English on machine translation readiness and post-editing productivity, and key learnings from the field about global content challenges that companies are trying to solve and new customer requirements they are working to meet.
View more videos