Bayesian methods for word alignment

Project leader


Parallel texts contain the same meaning expressed in multiple languages, which is becoming increasingly common in such diverse fields as parliamentary proceedings, religious texts and works of fiction. An important but difficult problem given a parallel text is how to find out which words in other languages a given word corresponds to. This is an essential first step in applications like machine translation, automatic lexicon construction, linguistic annotation transfer, etc. In this project, I explore methods based on Bayesian statistics, in particular Markov Chain Monte Carlo (MCMC) methods. This is applied to a variety of applications, including linguistic typology, sign language studies and semi-automatic annotation of underresourced languages.

Last updated on 2017-22-03 at 07:15