TSD 2012, September 3-7,
abstracts
Word
Translation Disambiguation is the task of selecting the best translation(s) for
a source word in a certain context, given a set of translation candidates. Most
approaches to this problem rely on large word-aligned parallel corpora,
resources that are scarce and expensive to build. In contrast, the method
presented in this paper requires only large monolingual corpora to build vector
space models encoding sentence-level contexts of translation candidates as
feature vectors in high-dimensional word space. Experimental evaluation shows
positive contributions of the models to overall quality in German-English
translation.
We
describe the structure of a space-efficient phrase table for phrase-based
statistical machine translation with the Moses decoder. The new phrase table
can be used in-memory or be partially mapped on-disk. Compared to the standard
Moses on-disk phrase table implementation a size reduction by
a factor of 6 is achieved. The focus of this work lies on the source
phrase index which is implemented using minimal perfect hash functions. Two
methods are discussed that reduce the memory consumption of a baseline
implementation.
In
this paper, we explore several methods of improving the estimation of
translation model probabilities for phrase-based statistical machine
translation given in-domain data sparsity. We
introduce a hierarchical variant of maximum a posteriori
(MAP) adaptation for domain adaptation with an arbitrary number of
out-of-domain models. We note that domain adaptation can have a smoothing
effect, and we explore the interaction between smoothing and the incorporation
of out-of-domain data. We find that the relative contributions of smoothing and
interpolation depend on the datasets used. For both the IWSLT 2011 and WMT 2011
English-French datasets, the MAP adaptation method we present improves on a
baseline system by 1.5+ BLEU points.
This
paper presents some problems involved in the machine translation of proper
names (PNs) from English into Vietnamese. Based on
the building of an English-Vietnamese comparable corpus of texts with numerous PNs extracted from online BBC News and translated by four
machine translation (MT) systems, we implement the PN error classification and
analysis. Some pre-processing solutions for reducing and limiting errors are
also proposed and tested with a manually annotated corpus in order to
significantly improve the MT quality.