Machine Translation
and the
Lexicon
Third International EAMT Workshop
Proceedings
Springer
(Lecture Notes in Artificial
Intelligence 898)
[ISBN: 3 540 59040 4]
Abstracts
Knowledge extraction from
machine-readable dictionaries: an evaluation – Nancy Ide and Jean Véronis (Université de Provence,
Machine-readable versions of everyday
dictionaries have been seen as a likely source of information for use in
natural language processing because they contain an enormous amount of lexical
and semantic knowledge. However, after 15 years of research, the results appear
to be disappointing. No comprehensive evaluation of machine-readable
dictionaries (MRDs) as a knowledge source has been
made to date, although this is necessary to determine what, if anything, can be gained from MRD research. To this end, this
paper will first consider the postulates upon which MRD research has been based
over the past fifteen years, discuss the validity of these postulates, and
evaluate the results of this work. We will then propose possible future
directions and applications that may exploit these years of effort, in the light
of current directions in not only NLP research, but also fields such as
lexicography and electronic publishing.
Description and acquisition of
multiword lexemes -- Angelika Storrer and Ulrike Schwall (
This paper deals with multiword lexemes (MWLs), focussing on two types of verbal MWLs:
verbal idioms and support verb constructions. We discuss the characteristic
properties of MWLs, namely non-standard compositionality,
restricted substitutability of components, and restricted morpho-syntactic
flexibility, and we show how these properties may cause serious problems during
the analysis, generation, and transfer steps of machine translation systems. In
order to cope with these problems, MT lexicons need to provide detailed
descriptions of MWL properties. We list the types of information which we
consider the necessary minimum for a successful processing of MWLs, and report on some feasibility studies aimed at the
automatic extraction of German verbal multiword lexemes from text corpora and
machine-readable dictionaries.
Pragmatics
of specialist terms: the acquisition and representation of terminology -- Khurshid Ahmad (
The compilation of specialist terminology
requires an understanding of how specialists coin and
use terms of their specialisms. We show how an
exploitation of the pragmatic features of specialist terms will help in the
semi-automatic extraction of terms and in the organisation of terms in
terminology data banks.
The
The Cambridge Language Survey is a
research project whose activities centre around the use of an Integrated
Language Database, whereby a computerised dictionary is used for intelligent
cross-reference during corpus analysis - searching for example for all the
inflections of a verb rather than just the base form. Types of grammatical
coding and semantic categorisation appropriate to such a computerised
dictionary are discussed, as are software tools for parsing, finding
collocations, and performing sense-tagging. The weighted evaluation of
semantic, grammatical, and collocational information to
discriminate between
word senses is described in some detail. Mention is made of several branches of
research including the development of parallel corpora, semantic interpretation
by sense-tagging, and the use of a Learner Corpus for the analysis of errors
made by non-native-speakers. Sense-tagging is identified as an under-exploited
approach to language analysis and one for which great opportunities for product development
exist.
Memory-based lexical acquisition and processing -- Walter Daelemans (
Current approaches to computational
lexicology in language technology are knowledge-based (competence-oriented) and
try to abstract away from specific formalisms, domains, and applications. This
results in severe complexity, acquisition and reusability bottlenecks. As an
alternative, we propose a particular performance-oriented approach to Natural
Language Processing based on automatic memory-based learning of linguistic
(lexical) tasks. The consequences of the approach for computational lexicology
are discussed, and the application of the approach on a number of lexical
acquisition and disambiguation tasks in phonology, morphology and syntax is
described.
Typed feature formalisms as a common basis for linguistic specification
-- Hans-Ulrich Krieger (
Typed feature formalisms (TFF) play an
increasingly important role in NLP and, in particular, in MT. Many of these
systems are inspired by Pollard and Sag's work on Head-Driven Phrase Structure
Grammar (HPSG), which has shown that a great deal of syntax and semantics can
be neatly encoded within TFF. However, syntax and semantics are not the only
areas in which TFF can be beneficially employed. In this paper, I will show
that TFF can also be used as a means to model finite automata (FA) and to
perform certain types of logical inferencing. In
particular, I will (i) describe how FA can be defined
and processed within TFF and (ii) propose a conservative extension to HPSG,
which allows for a restricted form of semantic processing within TFF, so that
the construction of syntax and semantics can be intertwined with the
simplification of the logical form of an utterance. The approach which I
propose provides a uniform, HPSG-oriented framework for different levels of
linguistic processing, including allomorphy and morphotactics, syntax, semantics, and logical form
simplification.
European efforts towards standardizing language resources
-- Nicoletta Calzolari (Istituto di Linguistica Computazionale,
This paper aims at providing a broad
overview of the situation in
Machine translation and terminology
database - uneasy bedfellows ? -- Machine Translation Group, Katharina Koch (SAP AG,
The software company SAP translates its
documentation into more than 12 languages. To support the translation
department, SAPterm is used as a traditional
terminology database for all languages, and the machine translation system
METAL for German-to-English translation. The maintenance of the two terminology
databases in parallel, SAPterm and the METAL lexicons, requires a comparison of the entries in order to
ensure terminological consistency. However, due to the differences in the
structure of the entries in SAPterm and METAL, an
automatic comparison has not yet been implemented. The search for a solution
has led to the consideration of using another existing SAP tool, called
Proposal Pool.
TransLexis: an integrated environment for lexicon and terminology management -- Brigitte Bläser (IBM Deutschland, Heidelberg, Germany) – pp. 159-173.
The IBM lexicon and terminology management
system TransLexis provides an integrated solution for
developing and maintaining lexical and terminological data for use by humans
and computer programs. In this paper, the conceptual schema of TransLexis, its user interface, and its import and export
facilities will be described. TransLexis takes up
several ideas emerging from the reuse discussion. In particular, it strives for
a largely theory-neutral representation of multilingual lexical and
terminological data, it includes export facilities to derive lexicons for
different applications, and it includes programs to import lexical and
terminological data from existing sources.
The use of
terminological knowledge bases in software localisation -- E.A. Karkaletsis, C.D. Spyropoulos, G. Vouros (Institute
of Informatics & Telecommunications,
This paper describes the work that was
undertaken in the Glossasoft project in the area of
terminology management. Some of the draw-backs of existing terminology
management systems are outlined and an alternative approach to maintaining
terminological data is proposed. The approach which we advocate relies on
knowledge-based representation techniques. These are used to model conceptual
knowledge about the terms included in the database, general knowledge about the
subject domain, application-specific knowledge, and - of course -
language-specific terminological knowledge. We consider the multifunctionality
of the proposed architecture to be one of its major advantages. To illustrate
this, we outline how the knowledge representation scheme, which we suggest,
could be drawn upon in message generation and machine-assisted translation.
Navigation
through terminological databases -- Renate
Mayer (Fraunhofer Institut für Arbeitswirtschaft und
Organisation,
Translating technical texts may cause many
problems concerning terminology, even for the professional technical
translator. For this reason, tools such as terminological databases or termbanks have been introduced to support the user in
finding the most suitable translation. Termbanks are
a type of machine-readable dictionary and contain extensive information on
technical terms. But a termbank offers more
possibilities than providing users with the electronic version of a printed
dictionary. This paper describes a multilingual termbank,
which was developed within the ESPRIT project Translator's Workbench. The termbank allows the user to create, maintain, and retrieve
specialised vocabulary. In addition, it offers the user the possibility to look
up definitions, foreign language equivalents, and background knowledge. In this
paper, an introduction to the database underlying the termbank
and the user interface is given with the emphasis lying on those functions
which initiate the user into a new subject by allowing him or her to navigate
through a terminology field. It will be shown how, by clustering the term
explanation texts and by linking them to a type of semantic network, such
functions can be implemented.
Types of lexical co-occurrences: descriptive parameters -- Folker Caroli (IAI, Saarbrücken) – pp.203-218.
In this article, I will discuss different
types of lexical co-occurrences and examine the requirements for representing
them in a reusable lexical resource. I will focus the discussion on the
delimitation of a limited set of descriptive parameters rather than on an
exhaustive classification of idioms or multiword units. Descriptive parameters
will be derived from a detailed discussion of the problem of how to determine
adequate translations for such units. Criteria for determining translation
equivalences between multiword units of two languages will be: the syntactic
and the semantic structure as well as functional, pragmatic, and stylistic
properties.
Perception vocabulary in five languages - towards an analysis using frame elements -- Nicholas Ostler (Linguacubun Ltd, London, UK) – pp.219-230.
This essay introduces the first linguistic
task of the DELIS project': to undertake a corpus-based examination of the
syntactic and semantic properties of perception vocabulary in five languages,
English, Danish, Dutch, French and Italian. The theoretical background is
Fillmore's Frame Semantics. The paper reviews some of the variety of facts to
be accounted for, particularly in the specialization of sense associated with
some collocations, and the pervasive phenomenon of Intensionality.
Through this review, we aim to focus our understanding of cross-linguistic
variation in this one domain, both by noting specific differences in word-sense
correlation, and by exhibiting a general means of representation.
Relating parallel
monolingual lexicon fragments for translation purposes -- Ulrich Heid (
In this paper, we introduce the
methodology for the construction of dictionary fragments under development in
DELIS. The approach advocated is corpus-based, computationally supported, and
aimed at the construction of parallel monolingual dictionary fragments which
can be linked to form translation dictionaries without many problems.
The parallelism of the monolingual fragments is achieved through the use of a shared inventory of descriptive devices, one common representation formalism (typed feature structures) for linguistic information from all levels, as well as a working methodology inspired by onomasiology: treating all elements of a given lexical semantic field consistently with common descriptive devices at the same time.
It is claimed that such monolingual dictionaries are particularly easy to relate in a machine translation application. The principles of such a combination of dictionary fragments are illustrated with examples from an experimental HPSG-based interlingua-oriented machine translation prototype.