Machine
Translation, vol.22, nos.1-2, March-June 2008, pp.1-27
Regression for machine translation evaluation at the sentence level
Joshua S. Albrecht Rebecca Hwa
Received: 9
September 2008 / Accepted: 31 October 2008 / Published online: 25 November 2008
© Springer Science+Business Media B.V. 2008
Abstract Machine learning offers a systematic framework for
developing metrics that
use multiple criteria to assess the quality of machine translation (MT).
However, learning introduces additional complexities that may impact on the
resulting metric's effectiveness.
First, a learned metric is more reliable for translations that are similar to its training examples; this
calls into question whether it is as effective in evaluating translations from systems that
are not its contemporaries. Second, metrics trained from different sets of training
examples may exhibit variations in their evaluations. Third, expensive developmental resources
(such as translations that have been evaluated by humans) may be needed as
training examples. This paper investigates these concerns in the context of using
regression to develop metrics for evaluating machine-translated sentences. We track a learned
metric's reliability across a 5 year period to measure the extent to which the
learned metric can evaluate sentences produced by other systems. We compare metrics trained under different conditions to
measure their variations. Finally, we present an
alternative formulation of metric training in which the features are based on comparisons against
pseudo-references in order to reduce the demand on human produced resources. Our
results confirm that regression is a useful approach for developing new metrics for
MT evaluation at the sentence level.
Keywords Machine translation Evaluation metrics Machine learning
Machine
Translation, vol.22, nos.1-2, March-June 2008, pp.29-66
Using
target-language information to train part-of-speech taggers for machine translation
Felipe
Sanchez-Martinez Juan Antonio Perez-Ortiz Mikel L. Forcada
Received: 28
January 2008 / Accepted: 27 October 2008 / Published online: 25 November 2008
© Springer Science+Business Media B.V. 2008
Abstract Although corpus-based approaches to machine translation
(MT) are growing in
interest, they are not applicable when the translation involves less-resourced language pairs for
which there are no parallel corpora available; in those cases, the
rule-based approach is the only applicable solution. Most rule-based MT systems make use of
part-of-speech (PoS) taggers to solve the PoS ambiguities in the source-language texts to translate; those MT systems
require accurate PoS taggers to produce reliable translations
in the target language (TL). The standard statistical approach to PoS ambiguity resolution (or tagging) uses hidden
Markov models (HMM)
trained in a supervised way from hand-tagged corpora, an expensive resource not always available, or in an
unsupervised way through the Baum-Welch expectation-maximization algorithm;
both methods use information only from the language being tagged. However, when
tagging is considered as an intermediate task for the translation procedure,
that is, when the PoS tagger is to be embedded as a
module within an MT system, information from the TL can be (unsupervisedly)
used in the training
phase to increase the translation quality of the whole MT system. This paper presents a method to train
HMM-based PoS taggers to be used in MT; the new
method uses not only
information from the source language (SL), as general-purpose methods do, but also information from the
TL and from the remaining modules of the MT system in which the PoS tagger
is to be embedded. We find that the translation quality of the MT system
embedding a PoS tagger trained in an unsupervised
manner through this
new method is clearly better than that of the same MT system embedding a PoS tagger trained through the Baum-Welch algorithm, and comparable to that
obtained by embedding a PoS tagger trained in a
supervised way from hand-tagged corpora.
Keywords Rule-based machine translation Part-of-speech tagging Hidden Markov models Language modeling
Machine
Translation, vol.22, nos.1-2, March-June 2008, pp.67-99
METIS-II: low resource machine translation
Michael Carl
Maite Melero Toni Badia Vincent Vandeghinste Peter Dirix
Ineke Schuurman Stella Markantonatou Sokratis Sofianopoulos
Marina Vassiliou Olga Yannoutsou
Received: 29
August 2008 / Accepted: 4 November 2008 / Published online: 27 November 2008
© Springer Science+Business Media B.V. 2008
Abstract METIS-II was an EU-FET MT project running from October
2004 to September
2007, which aimed at translating free text input without resorting to parallel corpora.
The idea was to use "basic" linguistic tools and representations and
to link them with patterns and statistics from the monolingual target-language
corpus. The METIS-II
project has four partners, translating from their "home" languages
Greek, Dutch,
German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation,
the resources used, and the results obtained. It also gives examples of how METIS-II has
continued beyond its lifetime and the original scope of the project. On the basis of
the results and experiences obtained, we believe that the approach is promising and offers
the potential for development in various directions.
Keywords Low resource MT
Machine
Translation, vol.22, no.3, September 2008, pp.101-152
Morphological
mismatches in machine translation
Igor Mel'cuk Leo Wanner
Received: 19
February 2008 / Accepted: 23 March 2009 / Published online: 13 May 2009
© Springer Science+Business Media B.V. 2009
Abstract This paper addresses one of the least studied, although
very important, problems
of machine translationthe problem of morphological mismatches between languages
and their handling during transfer. The level at which we assume transfer to be carried out is the Deep-Syntactic
Structure (DSyntS) as proposed in the Meaning-Text Theory. DSyntS
is abstract enough to avoid all types of surface morphological divergences. For
the remaining 'genuine' divergences between grammatical significations, we
propose a morphological transfer model. To illustrate this model, we apply it to the transfer of grammemes of definiteness and aspect for the language pair Russian-German and German-Russian, respectively.
Keywords Machine translation Transfer
Grammatical signification - Morphological
mismatch Deep-syntactic structure Meaning-Text Theory Definiteness Aspect
Machine
Translation, vol.22, no.3, September 2008, pp.153-173
Toward communicating simple
sentences using pictorial representations
Rada Mihalcea
Chee Wee Leong
Received: 3 August
2007 / Accepted: 16 March 2009 / Published online: 9 April 2009
© Springer Science+Business Media B.V. 2009
Abstract This paper addresses and evaluates the hypothesis that
pictorial representations can be used to effectively convey simple sentences
across language barriers. The paper makes two main contributions. First, it proposes an approach
to augmenting dictionaries
with illustrative images using volunteer contributions over the Web. The paper describes the PicNet illustrated dictionary, and evaluates the quality
and quantity of the
contributions collected through several online activities. Second, starting
with this
illustrated dictionary, the paper describes a system for the automatic
construction of
pictorial representations for simple sentences. Comparative evaluations show
that a considerable
amount of understanding can be achieved using visual descriptions of information, with evaluation
figures within a comparable range of those obtained with linguistic representations
produced by an automatic machine translation system.
Keywords Text-to-picture synthesis
Illustrated dictionaries Augmentative and alternative communication
Machine
Translation, vol.22, no.4, December 2008, pp.181-203
Translating emphatic/contrastive focus from English to Mandarin Chinese
Chen-li Kuo
Allan Ramsay
Received: 25
February 2009 / Accepted: 25 September 2009 / Published online: 16 October 2009
© Springer Science+Business Media B.V. 2009
Abstract Despite
the importance of intonation in spoken languages, deeper linguistic information encoded in
prosody is rarely taken into account in speech-to-speech machine translation systems.
This paper concerns the translation of spoken English into Mandarin Chinese,
paying particular attention to the emphatic/contrastive focus in questions which is realised by
means of phonological stress in spoken English but by lexical and syntactic
devices in Mandarin. There are two main reasons to translate phonologically marked
emphatic/contrastive focus with other linguistic devices: firstly, different languages tend to use
different devices to express emphatic/contrastive focus; secondly, the production of
prosody in text-to-speech systems is far from perfect. In this paper, a translation
framework which is capable of treating emphatic/contrastive focus is outlined and focus rules
are developed. The framework has been tested on a corpus of 207 utterances in the domain of
asthma, although the focus rules are not domain-specific.
Keywords Emphatic/contrastive focus
Mandarin focus construction Prosody Speech translation Rule-based MT
Machine
Translation, vol.22, no.4, December 2008, pp.205-258
Generating Arabic text in multilingual speech-to-speech machine translation framework
Azza Abdel Monem Khaled
Shaalan Ahmed Rafea Hoda Baraka
Received: 6
December 2007 / Accepted: 2 September 2009 / Published online: 2 October 2009
© Springer Science+Business Media B.V. 2009
Abstract The interlingual approach to
machine translation (MT) is used successfully in multilingual translation. It aims to achieve the
translation task in two independent steps. First, meanings of the
source-language sentences are represented in an intermediate language-independent (Interlingua) representation. Then,
sentences of the target language are
generated from those meaning representations. Arabic natural language processing in general is still
underdeveloped and Arabic natural language generation (NLG) is even less developed. In particular, Arabic NLG from
Interlinguas was only investigated
using template-based approaches. Moreover, tools used for other languages are not easily adaptable to Arabic due
to the language complexity at both the
morphological and syntactic levels. In this paper, we describe a rule-based
generation approach for task-oriented Interlingua-based spoken dialogue that
transforms a relatively shallow
semantic interlingual representation, called
interchange format (IF), into Arabic text that corresponds to the intentions
underlying the speaker's utterances. This approach addresses the
handling of the problems of Arabic syntactic structure determination, and Arabic morphological and syntactic
generation within the Interlingual MT approach. The
generation approach is developed primarily within the framework of the NESPOLE! (NEgotiating through SPOken
Language in E-commerce) multilingual speech-to-speech
MT project.
The IF-to-Arabic generator is implemented in SICStus Prolog.
We conducted evaluation experiments using the input and output from the English analyzer that was
developed by the NESPOLE! team at
Keywords Machine translation
Interlingua Rule-based text generation Natural language generation Arabic natural
language processing