Bayesian methods for word alignment


Project leader


Description

Parallel texts contain the same meaning expressed in multiple languages, which is becoming increasingly common in such diverse fields as parliamentary proceedings, religious texts and works of fiction. An important but difficult problem given a parallel text is how to find out which words in other languages a given word corresponds to. This is an essential first step in applications like machine translation, automatic lexicon construction, linguistic annotation transfer, etc. In this project, I explore methods based on Bayesian statistics, in particular Markov Chain Monte Carlo (MCMC) methods. This is applied to a variety of applications, including linguistic typology, sign language studies and semi-automatic annotation of underresourced languages.

Last updated on 2017-22-03 at 07:15