Machine Translation, vol.22, nos.1-2, March-June 2008, pp.1-27

Regression for machine translation evaluation at the sentence level

Joshua S. Albrecht • Rebecca Hwa

Received: 9 September 2008 / Accepted: 31 October 2008 / Published online: 25 November 2008
© Springer Science+Business Media B.V. 2008

Abstract Machine learning offers a systematic framework for developing metrics that use multiple criteria to assess the quality of machine translation (MT). However, learning introduces additional complexities that may impact on the resulting metric's effectiveness. First, a learned metric is more reliable for translations that are similar to its training examples; this calls into question whether it is as effective in evaluating translations from systems that are not its contemporaries. Second, metrics trained from different sets of training examples may exhibit variations in their evaluations. Third, expensive developmental resources (such as translations that have been evaluated by humans) may be needed as training examples. This paper investigates these concerns in the context of using regression to develop metrics for evaluating machine-translated sentences. We track a learned metric's reliability across a 5 year period to measure the extent to which the learned metric can evaluate sentences produced by other systems. We compare metrics trained under different conditions to measure their variations. Finally, we present an alternative formulation of metric training in which the features are based on comparisons against pseudo-references in order to reduce the demand on human produced resources. Our results confirm that regression is a useful approach for developing new metrics for MT evaluation at the sentence level.

Keywords Machine translation • Evaluation metrics • Machine learning

Machine Translation, vol.22, nos.1-2, March-June 2008, pp.29-66

Using target-language information to train part-of-speech taggers for machine translation

Felipe Sanchez-Martinez • Juan Antonio Perez-Ortiz • Mikel L. Forcada

Received: 28 January 2008 / Accepted: 27 October 2008 / Published online: 25 November 2008
© Springer Science+Business Media B.V. 2008

Abstract Although corpus-based approaches to machine translation (MT) are growing in interest, they are not applicable when the translation involves less-resourced language pairs for which there are no parallel corpora available; in those cases, the rule-based approach is the only applicable solution. Most rule-based MT systems make use of part-of-speech (PoS) taggers to solve the PoS ambiguities in the source-language texts to translate; those MT systems require accurate PoS taggers to produce reliable translations in the target language (TL). The standard statistical approach to PoS ambiguity resolution (or tagging) uses hidden Markov models (HMM) trained in a supervised way from hand-tagged corpora, an expensive resource not always available, or in an unsupervised way through the Baum-Welch expectation-maximization algorithm; both methods use information only from the language being tagged. However, when tagging is considered as an intermediate task for the translation procedure, that is, when the PoS tagger is to be embedded as a module within an MT system, information from the TL can be (unsupervisedly) used in the training phase to increase the translation quality of the whole MT system. This paper presents a method to train HMM-based PoS taggers to be used in MT; the new method uses not only information from the source language (SL), as general-purpose methods do, but also information from the TL and from the remaining modules of the MT system in which the PoS tagger is to be embedded. We find that the translation quality of the MT system embedding a PoS tagger trained in an unsupervised manner through this new method is clearly better than that of the same MT system embedding a PoS tagger trained through the Baum-Welch algorithm, and comparable to that obtained by embedding a PoS tagger trained in a supervised way from hand-tagged corpora.

Keywords Rule-based machine translation • Part-of-speech tagging • Hidden Markov models • Language modeling

Machine Translation, vol.22, nos.1-2, March-June 2008, pp.67-99

METIS-II: low resource machine translation

Michael Carl • Maite Melero • Toni Badia • Vincent Vandeghinste • Peter Dirix • Ineke Schuurman • Stella Markantonatou • Sokratis Sofianopoulos • Marina Vassiliou • Olga Yannoutsou

Received: 29 August 2008 / Accepted: 4 November 2008 / Published online: 27 November 2008
© Springer Science+Business Media B.V. 2008

Abstract METIS-II was an EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use "basic" linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their "home" languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It also gives examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we believe that the approach is promising and offers the potential for development in various directions.

Keywords Low resource MT • Statistical MT • Pattern-based MT • Shallow linguistic processing for MT

Machine Translation, vol.22, no.3, September 2008, pp.101-152

Morphological mismatches in machine translation

Igor Mel'cuk • Leo Wanner

Received: 19 February 2008 / Accepted: 23 March 2009 / Published online: 13 May 2009
© Springer Science+Business Media B.V. 2009

Abstract This paper addresses one of the least studied, although very important, problems of machine translation—the problem of morphological mismatches between languages and their handling during transfer. The level at which we assume transfer to be carried out is the Deep-Syntactic Structure (DSyntS) as proposed in the Meaning-Text Theory. DSyntS is abstract enough to avoid all types of surface morphological divergences. For the remaining 'genuine' divergences between grammatical significations, we propose a morphological transfer model. To illustrate this model, we apply it to the transfer of grammemes of definiteness and aspect for the language pair Russian-German and German-Russian, respectively.

Keywords Machine translation • Transfer • Grammatical signification - Morphological mismatch • Deep-syntactic structure • Meaning-Text Theory • Definiteness • Aspect

Machine Translation, vol.22, no.3, September 2008, pp.153-173

Toward communicating simple sentences using pictorial representations

Rada Mihalcea • Chee Wee Leong

Received: 3 August 2007 / Accepted: 16 March 2009 / Published online: 9 April 2009
© Springer Science+Business Media B.V. 2009

Abstract This paper addresses and evaluates the hypothesis that pictorial representations can be used to effectively convey simple sentences across language barriers. The paper makes two main contributions. First, it proposes an approach to augmenting dictionaries with illustrative images using volunteer contributions over the Web. The paper describes the PicNet illustrated dictionary, and evaluates the quality and quantity of the contributions collected through several online activities. Second, starting with this illustrated dictionary, the paper describes a system for the automatic construction of pictorial representations for simple sentences. Comparative evaluations show that a considerable amount of understanding can be achieved using visual descriptions of information, with evaluation figures within a comparable range of those obtained with linguistic representations produced by an automatic machine translation system.

Keywords Text-to-picture synthesis • Illustrated dictionaries • Augmentative and alternative communication

Machine Translation, vol.22, no.4, December 2008, pp.181-203

Translating emphatic/contrastive focus from English to Mandarin Chinese

Chen-li Kuo • Allan Ramsay

Received: 25 February 2009 / Accepted: 25 September 2009 / Published online: 16 October 2009
© Springer Science+Business Media B.V. 2009

Abstract Despite the importance of intonation in spoken languages, deeper linguistic information encoded in prosody is rarely taken into account in speech-to-speech machine translation systems. This paper concerns the translation of spoken English into Mandarin Chinese, paying particular attention to the emphatic/contrastive focus in questions which is realised by means of phonological stress in spoken English but by lexical and syntactic devices in Mandarin. There are two main reasons to translate phonologically marked emphatic/contrastive focus with other linguistic devices: firstly, different languages tend to use different devices to express emphatic/contrastive focus; secondly, the production of prosody in text-to-speech systems is far from perfect. In this paper, a translation framework which is capable of treating emphatic/contrastive focus is outlined and focus rules are developed. The framework has been tested on a corpus of 207 utterances in the domain of asthma, although the focus rules are not domain-specific.

Keywords Emphatic/contrastive focus • Mandarin focus construction • Prosody • Speech translation • Rule-based MT

Machine Translation, vol.22, no.4, December 2008, pp.205-258

Generating Arabic text in multilingual speech-to-speech machine translation framework

Azza Abdel Monem • Khaled Shaalan • Ahmed Rafea • Hoda Baraka

Received: 6 December 2007 / Accepted: 2 September 2009 / Published online: 2 October 2009
© Springer Science+Business Media B.V. 2009

Abstract The interlingual approach to machine translation (MT) is used successfully in multilingual translation. It aims to achieve the translation task in two independent steps. First, meanings of the source-language sentences are represented in an intermediate language-independent (Interlingua) representation. Then, sentences of the target language are generated from those meaning representations. Arabic natural language processing in general is still underdeveloped and Arabic natural language generation (NLG) is even less developed. In particular, Arabic NLG from Interlinguas was only investigated using template-based approaches. Moreover, tools used for other languages are not easily adaptable to Arabic due to the language complexity at both the morphological and syntactic levels. In this paper, we describe a rule-based generation approach for task-oriented Interlingua-based spoken dialogue that transforms a relatively shallow semantic interlingual representation, called interchange format (IF), into Arabic text that corresponds to the intentions underlying the speaker's utterances. This approach addresses the handling of the problems of Arabic syntactic structure determination, and Arabic morphological and syntactic generation within the Interlingual MT approach. The generation approach is developed primarily within the framework of the NESPOLE! (NEgotiating through SPOken Language in E-commerce) multilingual speech-to-speech MT project. The IF-to-Arabic generator is implemented in SICStus Prolog. We conducted evaluation experiments using the input and output from the English analyzer that was developed by the NESPOLE! team at Carnegie Mellon University. The results of these experiments were promising and confirmed the ability of the rule-based approach in generating Arabic translation from the Interlingua taken from the travel and tourism domain.

Keywords Machine translation • Interlingua • Rule-based text generation • Natural language generation • Arabic natural language processing