Report by Yi Fan He and Paraic Sheridan of CNGL
The 4th annual MT Marathon was held over five days during January in Dublin, hosted by the National Centre for Language Technology and the Centre for Next Generation Localisation (CNGL) as a partner in the EuroMatrixPlus consortium, which aims to provide a major boost to MT technology by applying the most advanced MT technologies systematically to all pairs of EU languages. Previous the MT Marathons had been held in Edinburgh (2007), Berlin/Wandlitz (2008) and Prague (2009).
Researchers, developers, students, and users of machine translation technology from all over the world attend lectures and labs introducing them over the course of days to the latest research in the field. More than 100 participants from 20 countries came to Dublin to join this year’s event.
Proposals were solicited for open-source MT projects on which developers and researchers could collaborate during the lab sessions of the Marathon. Over twenty open-source project ideas were submitted this year, of which seventeen received development support during the course of the Marathon.
Lectures and Presentations
The programme of lectures served as an introduction to Machine Translation for new practitioners. The series of lectures over the five days started with an introduction to (Statistical) Machine Translation and MT Evaluation followed by lectures on Word-Based Models and Word Alignment, Phrase-Based Models, Syntax-Based Models and a lecture on Rules-Based and Example-Based Machine Translation.
As well as introductory lectures, a series of research presentations was given by researchers who had submitted their papers for peer review and selection. Research presentations included studies of MT post-editing productivity, cloud-based and open-source MT systems, harvesting parallel texts from multilingual sites, and approaches to MT system combination.
Research publications accepted to the MT Marathon will be published in the Prague Bulletin of Mathematical Linguistics. Papers are also available for download online at the MT Marathon programme page.
Labs and Projects
In the afternoon sessions, beginners practiced what they learned from the morning lectures while more advanced developers worked together on open source projects in a hands-on lab setting.
This year’s open source project proposals represented a large variety of interests. Projects included implementations of algorithms and data structures (lattice-based decoding, smoothing phrase tables, dynamic suffix arrays), projects on resource acquisition (bilingual corpus acquisition, lexicon induction, and grammar extraction) and projects related to Wiki Translation, Irish-English MT, MT experiment management, language modelling and decoding, and example-based machine translation.
The open-source MT systems Moses and Joshua featured heavily in projects as well as Marclator, a new open-souce EBMT system released by the MT research group at DCU ahead of the Marathon. The Open-Source MT projects from the MT Marathon are being tracked online at: http://statmt.org/mtm4/
Overall, the 4th MT Marathon was considered a huge success and a very productive week of meeting, sharing, learning and open-source software development. As the EuroMatrixPlus project is funded through to February 2012, it is expected that the annual MT Marathons will continue to attract a large diverse group of MT practitioners to share the latest MT research and advance the availability of open-source MT resources.
More information at:
Fourth MT Marathon: http://www.mtmarathon2010.info
EuroMatrixPlus: http://www.euromatrixplus.net
CNGL: http://www.cngl.ie
| |




