Machine Translation, vol.19, no.1, 2005,
pp.1-36
Controlled Translation in an Example-based Environment: What Do Automatic Evaluation Metrics Tell Us?
E-mail:
{away, ngough}@computing.dcu.ie
Abstract. This paper presents an extended, harmonised account of our previous work on integrating controlled language data in an Example-based Machine Translation system. Gough and Way in MT Summit pp. 133-140 (2003) focused on controlling the output text in a novel manner, while Gough and Way (9th Workshop of the EAMT, (2004a), pp. 73-81) sought to constrain the input strings according to controlled language specifications. Our original sub-sentential alignment algorithm could deal only with 1:1 matches, but subsequent refinements enabled n:m alignments to be captured. A direct consequence was that we were able to populate the system's databases with more than six times as many potentially useful fragments. Together with two simple novel improvements - correcting a small number of mistranslations in the lexicon, and allowing multiple translations in the lexicon – translation quality improves considerably. We provide detailed automatic and human evaluations of a number of experiments carried out to test the quality of the system. We observe that our system outperforms the rule-based on-line system Logomedia on a range of automatic evaluation metrics, and that the 'best' translation candidate is consistently highly ranked by our system. Finally, we note in a number of tests that the BLEU metric gives objectively different results than other automatic evaluation metrics and a manual evaluation. Despite these conflicting results, we observe a preference for controlling the source data rather than the target translations.
Key words:
controlled
translation, example-based MT, Marker Hypothesis, evaluation
Machine Translation, vol.19, no.1, 2005,
pp.37-58
Methodologies for Measuring the Correlations between Post-Editing Effort and Machine Translatability
SHARON O'BRIEN
Centre for Translation and Textual Studies,
E-mail:
sharon.obrien@dcu.ie
Abstract. Against the background of a wider
research project that aims to investigate the correlation, if any, between
post-editing effort and the presence of negative translatability indicators in source texts submitted to Machine Translation
(MT), this paper sets out to assess the
potential of two methods for measuring the effort involved in post-editing MT output.
The first is based on the use of the keyboard-monitoring program Translog; the second on Choice Network Analysis (CNA). The paper
reviews relevant research in both machine translatability and MT post-editing,
and appraises, in particular, the suitability of think-aloud protocols in assessing post-editing effort. The combined
use of Translog and CNA is proposed as a way of overcoming some of the
difficulties presented by the use of
think-aloud protocols in the current context. Initial results from a study
conducted at
Key words: post-editing, machine
translatability, think-aloud protocol, Translog,
choice network
analysis, controlled language
Machine Translation, vol.19, no.1, 2005,
pp.59-82
Practical Word-Sense Disambiguation Using Co-occurring Concept Codes
YOUJIN CHUNG and JONG-HYEOK LEE
Div. of Electrical and Computer Engineering, POSTECH and
E-mail: {prizer,
jhlee}@postech.ac.kr
Abstract. Most
previous corpus-based approaches to the resolution of word-sense ambiguity have collected lexical information from the
context of the word to be disambiguated. However, they suffer from the problem of data sparseness. To address
this problem, this paper proposes a
disambiguation method using co-occurring concept codes (CCCs).
The use of concept-code features and
concept-code generalization effectively alleviate the data sparseness problem and also reduce the number of
features to a practical size without any
loss in system performance. We prove the effectiveness of the CCC features and the concept-code generalization by experimental
evaluations. The proposed disambiguation method is applied to a
Korean-to-Japanese MT system that experimented with various machine-learning techniques. In a lexical
sample evaluation, our CCC-based method
achieved a precision of 82.00%, with
an 11.83% improvement over the baseline. Also, it achieved a precision of 83.51% in an experiment on real text, which
shows that our proposed method is
very useful for practical MT systems.
Key words: word-sense
disambiguation, data sparseness, automatic sense-tagging, co-occurring concept-code, concept-code generalization, Korean
Machine Translation, vol.19, no.1, 2005,
pp.83-112
The Long-Term Forecast for Weather Bulletin Translation
PHILIPPE LANGLAIS, SIMONA GANDRABUR, THOMAS LEPLUS and GUY LAPALME
DIRO/RALI, Département d'informatique et de recherche operationnelle, Université
de Montréal,
C.P. 6128, Montréal, H3C 3J7,
E-mail: {felipe,gandrabu,
leplus, lapalme}@IRO.UMontreal.CA
Abstract. Machine Translation (MT) is the focus of extensive scientific investigations driven by regular evaluation campaigns, but which are mostly oriented towards a somewhat particular task: translating news articles into English. In this paper, we investigate how well current MT approaches deal with a real-world task. We have rationally reconstructed one of the only MT systems in daily use which produces high-quality translation: the Météo system. We show how a combination of a sentence-based memory approach, a phrase-based statistical engine and a neural-network rescorer can give results comparable to those of the current system. We also explore another possible prospect for MT technology: the translation of weather alerts, which are currently being translated manually by translators at the Canadian Translation Bureau.
Key words: corpus-based MT, Translation Memory,
statistical MT, bootstrapping, rescoring, Météo
Machine Translation, vol.19, no.2, 2005,
pp.113-137
Toward practical spoken language translation
Chengqing Zong • Mark Seligman
Received: 24 February 2004 / Accepted: 22 February 2006 /
Published online: 30 June 2006
© Springer Science+Business Media B.V. 2006
Abstract This paper argues that the time is now right to field practical Spoken Language Translation (SLT) systems. Several sorts of practical systems can be built over the next few years if system builders recognize that, at the present state of the art, users must cooperate and compromise with the programs. Further, SLT systems can be arranged on a scale, in terms of the degree of cooperation or compromise they require from users. In general, the broader the intended linguistic or topical coverage of a system, the more user cooperation or compromise it will presently require. The paper briefly discusses the component technologies of SLT systems as they relate to user cooperation and accommodation ("human factors engineering"), with examples from the authors' work. It describes three classes of "cooperative" SLT systems which could be put into practical use during the next few years.
Keywords Spoken Language Translation • User
cooperation • Cooperative systems • Human factors engineering
Machine Translation, vol.19, no.2, 2005,
pp.139-173
An NLP lexicon as a largely language-independent resource
Marjorie McShane • Sergei Nirenburg • Stephen Beale
Received: 31 August 2005 / Accepted: 27 February 2006 /
Published online: 21 June 2006
© Springer Science+Business Media B.V. 2006
Abstract This paper describes salient aspects
of the OntoSem lexicon of English, a lexicon whose semantic descriptions can
either be grounded in a language-independent ontology, rely on extra-ontological expressive
means, or exploit a combination of the two. The variety of descriptive means, as well as the
conceptual complexity of semantic description to begin with, necessitates that OntoSem lexicons be compiled primarily manually. However,
once a
semantic description is created for a lexeme in one language, it can be reused
in others, often with
little or no modification. Said differently, the challenge in building a
semantic lexicon is
describing semantics; once the semantics are described, it is relatively
straightforward to connect given meanings to the appropriate head words in
other languages. In this paper we provide a brief overview of the OntoSem lexicon and processing environment, orient our approach to lexical semantics among
others in the field, and describe in more detail what we mean by the largely
language-independent lexicon. Finally, we suggest reasons why our resources might be of interest to the
larger community.
Keywords Lexicon • Ontological Semantics •
Semantics • Language-independent resources • Knowledge-rich NLP
Machine Translation, vol.19, no.2, 2005,
pp.175-192
The language translation interface: A perspective from the users
Dominique Estival
Received: 31 October 2005 / Accepted: 23 March 2006 /
Published online: 4 May 2006
© Springer Science+Business Media B.V. 2006
Abstract The Language Translation Interface (LTI) is a prototype developed for the Australian Defence Organisation. The aim is to provide a single, simple, interface to a variety of MT tools and utilities for personnel who need to produce translations when they have no easy access to human translators. This paper describes the functionalities of the LTI and reports on our experience with users during development. The LTI has been demonstrated and trialled at several military exercises and the feedback received is now leading to the development of the Language Translation Tools Suite (LTTS)
Keywords Translation tools • Multiple translations • Military
users
Machine Translation, vol.19, no.3-4, 2005,
pp.197-211
Example-based machine translation: a review and commentary
John Hutchins
Received: 9 May 2006 / Accepted: 16 May 2006 /
Published online: 25 July 2006
© Springer Science+Business Media B.V. 2006
Abstract In the last decade the dominant models of MT have been data-driven or corpus-based. Of the two main trends, statistical machine translation and example-based machine translation (EBMT), the latter is much less clearly defined. In a review of the recently published collection edited by Michael Carl and Andy Way, this essay surveys the basic processes, methods, main problems and tasks of EBMT, and attempts to provide a definition of the essence of EBMT in comparison with statistical MT and traditional rule-based MT.
Keywords Example-based machine translation •
Statistical machine translation • Review • Survey • Definition
Machine Translation, vol.19, no.3-4, 2005,
pp.213-227
MT model space: statistical versus compositional versus example-based
machine translation
Dekai Wu
Received: 23 January 2006 / Accepted: 25 August 2006 /
Published online: 14 February 2007
© Springer Science+Business Media B.V. 2007
Abstract We offer a perspective on EBMT from a statistical MT standpoint, by developing a three-dimensional MT model space based on three pairs of definitions: (1) logical versus statistical MT, (2) schema-based versus example-based MT, and (3) lexical versus compositional MT. Within this space we consider the interplay of three key ideas in the evolution of transfer, example-based, and statistical approaches to MT. We depict how all translation models face these issues in one way or another, regardless of the school of thought, and suggest where the real questions for the future may lie.
Keywords
Machine Translation, vol.19, no.3-4, 2005,
pp.229-249
A system-theoretical view of EBMT
Michael Carl
Received: 30 January 2006/Accepted: 30 August 2006 /
Published online: 21 December 2006
© Springer Science+Business Media B.V. 2006
Abstract According to the system theory of von Bertalanffy (1968), a "system" is an entity that can be distinguished from its environment and that consists of several parts. System theory investigates the role of the parts, their interaction and the relation of the whole with its environment. System theory of the second order examines how an observer relates to the system. This paper traces some of the recent discussion of example-based machine translation (EBMT) and compares a number of EBMT and statistical MT systems. It is found that translation examples are linguistic systems themselves that consist of words, phrases and other constituents. Two properties of Luhmann's (2002) system theory are discussed in this context: EBMT has focussed on the properties of structures suited for translation and the design of their reentry points, and SMT develops connectivity operators which select the most likely continuations of structures. While technically the SMT and EBMT approaches complement each other, the principal distinguishing characteristic results from different sets of values which SMT and EBMT followers prefer.
Keywords Example-based machine translation •
Statistical machine translation • System theory • Emergent behaviour •
Statistical EBMT
Machine Translation, vol.19, no.3-4, 2005,
pp.251-282
Purest ever example-based machine translation: Detailed presentation
and assessment
Yves Lepage • Etienne Denoual
Received: 16 December 2005 / Accepted: 25 August 2006 /
Published online: 19 December 2006
© Springer Science+Business Media B.V. 2006
Abstract We have designed, implemented and assessed an EBMT system that can be dubbed the "purest ever built": it strictly does not make any use of variables, templates or patterns, does not have any explicit transfer component, and does not require any preprocessing or training of the aligned examples. It uses only a specific operation, proportional analogy, that implicitly neutralizes divergences between languages and captures lexical and syntactic variations along the paradigmatic and syntagmatic axes without explicitly decomposing sentences into fragments. Exactly the same genuine implementation of such a core engine was evaluated on different tasks and language pairs. To begin with, we compared our system on two tasks of a previous MT evaluation campaign to rank it among other current state-of-the-art systems. Then, we illustrated the "universality" of our system by participating in a recent MT evaluation campaign, with exactly the same core engine, for a wide variety of language pairs. Finally, we studied the influence of extra data like dictionaries and paraphrases on the system performance.
Keywords Example-based machine translation •
Proportional analogies • Divergences across
languages
Machine Translation, vol.19, no.3-4, 2005,
pp.283-299
Inducing translation templates with type
constraints
Ilyas Cicekli
Received: 14 December 2005/Accepted: 30 August 2006 /
Published online: 11 January 2007
© Springer Science+Business Media B.V. 2006
Abstract This paper presents a generalization technique that induces translation templates from a given set of translation examples by replacing differing parts in the examples with typed variables. Since the type of each variable is inferred during the learning process, each induced template is also associated with a set of type constraints. The type constraints that are associated with a translation template restrict the usage of the translation template in certain contexts in order to avoid some of the wrong translations. The types of variables are induced using type lattices designed for both the source and target languages. The proposed generalization technique has been implemented as a part of an example-based machine translation system.
Keywords Example-based MT • Machine learning
Machine Translation, vol.19, no.3-4, 2005,
pp.301-323
Hybrid data-driven models of machine translation
Declan
Received: 23 January 2006/Accepted: 20 April 2006 /
Published online: 2 November 2006
© Springer Science+Business Media B.V. 2006
Abstract This paper presents an extended, harmonised account of our previous work on combining subsentential alignments from phrase-based statistical machine translation (SMT) and example-based MT (EBMT) systems to create novel hybrid data-driven systems capable of outperforming the baseline SMT and EBMT systems from which they were derived. In previous work, we demonstrated that while an EBMT system is capable of outperforming a phrase-based SMT (PBSMT) system constructed from freely available resources, a hybrid 'example-based' SMT system incorporating marker chunks and SMT subsentential alignments is capable of outperforming both baseline translation models for French-English translation. In this paper, we show that similar gains are to be had from constructing a hybrid 'statistical' EBMT system. Unlike the previous research, here we use the Europarl training and test sets, which are fast becoming the standard data in the field. On these data sets, while all hybrid 'statistical' EBMT variants still fall short of the quality achieved by the baseline PBSMT system, we show that adding the marker chunks to create a hybrid 'example-based' SMT system outperforms the two baseline systems from which it is derived. Furthermore, we provide further evidence in favour of hybrid systems by adding an SMT target-language model to the EBMT system, and demonstrate that this too has a positive effect on translation quality. We also show that many of the subsentential alignments derived from the Europarl corpus are created by either the PBSMT or the EBMT system, but not by both. In sum, therefore, despite the obvious convergence of the two paradigms, the crucial differences between SMT and EBMT contribute positively to the overall translation quality. The central thesis of this paper is that any researcher who continues to develop an MT system using either of these approaches will benefit further from integrating the advantages of the other model; dogged adherence to one approach will lead to inferior systems being developed.
Keywords Hybrid • Example-based MT •