TAUS - Enabling better translation

Saturday
Feb 04th
Text size
  • Increase font size
  • Default font size
  • Decrease font size
Home

Languagelens: from PhD project to dedicated patent MT service

Daniel HardtThe Danish Languagelens System is a statistical MT engine that began as an academic project two years ago and now drives millions of words of English to Danish patent translation at the Copenhagen based LSP Lingtech. Theoretical linguist Daniel Hardt now supervises development at Language Lens.

"The bottom line to the rise of SMT is Moore's Law. As long as computing power grows exponentially, anything linked to it - such as vast amounts of data - will expand in the same way," says Hardt. Languagelens grew out of a research project on integrating SMT with linguistic knowledge, and has been operating commercially since 2007.

Under its current business model, Languagelens is licensing the technology to Lingtech, which has a strong track record in rule based MT, and CorpusMT, a new language technology company, gradually building up a client base in partnership with them.

"Our competitive edge in this field will be in customizing the engine to the special needs of clients who already have 10 million plus words of parallel language data. As soon as you enter a subject matter domain, the data will explode as post-edited output is recycled as automatic training content. Data and quality will always improve over time."

For Daniel Hardt, the key to success with SMT is to first get above the level of quality where post-editing takes longer than human translating. The inherent quality improvement cycle based on retraining with post-edited content will continually reduce the post-editing step. He notes that this is not something that the Google MT service can deliver.

"When I talk MT quality output to clients, some say ‘It takes my translator 15 minutes. If it takes less time with your system, the quality is good; otherwise it is bad. ." In the case of the company's work on patents, the capacity to leverage a 10 million word corpus means that eventually the output only needs very light post-editing.

Apart from general ingrained skepticism about MT in the public mind - and among linguists - Hardt notes that potential end users usually understand the logic of recycling their existing language data for an automated solution.

"People are understandably concerned about questions of data security. But one cannot reconstruct a complete text from the data in an SMT system, so the data remains confidential. But any effort to share data will help prime the pump for more MT throughput."

The TAUS Take:

Languagelens is a good example of the rapid commercialization of an SMT system customized to a data-rich domain. It is proving far more cost-effective and productive than the (admittedly very old) rules based system previously implemented for the same task. The developers themselves were surprised by the speed and ease with which an MT system could be put together and harnessed to a translation process. Expect to see many more such deployments competing for new language pairs in vertical markets.

 

Add comment


Security code
Refresh

SUBSCRIBE TO OUR FREE NEWSLETTERS AND ALERTS

Learn about the best translation technologies, open platforms and interoperability, the possibilities of machine translation. Subscribe to our alerts and keep up to date with the latest events, articles and reports.

JOIN OUR MAILING LIST

OTHER TAUS SITES

TRANSLATION AUTOMATION TIMELINE

At TAUS we're forward-thinking. Which means we try to know our history. So explore with us the story of translation automation in the digital age. See timeline

RECENT VIDEOS

Researchers debate on future translation technologies
Researchers focus on a myriad of nuances in search of improvements. Major research groups and leading global researchers help to ground us in reality and help shed light on what we can expect in the near future.
View more videos