Verbmobil: foundations of speech-to-speech translation

Verbmobil: foundations of speech-to-speech translation

Ed. Wolfgang Wahlster (Berlin: Springer, 2000)

abstracts

Mobile Speech-To-Speech Translation of Spontaneous Dialogs: An Overview of The Final Verbmobil System

Wolfgang Wahlster

DFKI GmbH, Saarbrücken, Germany

Abstract. Verbmobil is a speaker-independent and bidirectional speech-to-speech translation system for spontaneous dialogs in mobile situations. It recognizes spoken input, analyses and translates it, and finally utters the translation. The multilingual system handles dialogs in three business-oriented domains, with context-sensitive translation between three languages (German, English, and Japanese). Since Verbmobil emphasizes the robust processing of spontaneous dialogs, it poses difficult challenges to human language technology, that we discuss in this paper. We present Verbmobil as a hybrid system incorporating both deep and shallow processing schemes. We describe the anatomy of Verbmobil and the functionality of its main components. We discuss Verbmobil's multi-blackboard architecture that is based on packed representations at all processing stages. These packed representations together with formalisms for underspecification capture the non-determinism in each processing phase, so that the remaining uncertainties can be reduced by linguistic, discourse and domain constraints as soon as they become applicable. We present Verbmobil's multi-engine approach, eg. its use of five concurrent translation engines: statistical translation, case-based translation, substring-based translation, dialog-act based translation, and semantic transfer. Distinguishing features like the multilingual prosody module and the generation of dialog summaries are highlighted. We conclude that Verbmobil has successfully met the project goals with more than 80% of approximately correct translations and a 90% success rate for dialog tasks.

Facts and Figures about the Verbmobil Project

Reinhard Karger and Wolfgang Wahlster

DFKI GmbH, Saarbrücken, Germany

Abstract. In this chapter the organizational and funding structure of the Verbmobil project is summarized and the major technical data about the final Verbmobil system and the Verbmobil archives are compiled.

Multilingual Speech Recognition

Alex Waibel, Hagen Soltau, Tanja Schultz, Thomas Schaaf, and Florian Metze

Institut für Logik, Komplexität und Deduktionssysteme, Universität Karlsruhe, Germany

Abstract. The speech-to-speech translation system Verbmobil requires a multilingual setting. This consists of recognition engines in the three languages German, English and Japanese that run in one common framework together with a language identification component which is able to switch between these recognizers. This article describes the challenges of multilingual speech recognition and presents different solutions to the problem of the automatic language identification task. The combination of the described components results in a flexible and user-friendly multilingual spoken dialog system.

Robust Recognition of Spontaneous Speech

Udo Haiber¹, Helmut Mangold¹, Thilo Pfau², Peter Regel-Brietzmann¹, Günther Ruske², and Volker Schleß¹

¹DaimlerChrysler AG, Research and Technology, Ulm, Germany

²Institute for Human-Machine-Communication, Technische Universität München, Germany

Abstract. This contribution describes the challenges and the progress which have been made in Verbmobil concerning robustness of speech recognition for various types of adverse conditions, like channel distortion, environmental noise and various speaker and speaking conditions. For the channel and noise problem classical approaches like cepstral bias normalization and spectral subtraction methods have been improved as well as new methods like parallel model combination. One major result is the fact, that an intelligent combination of various methods achieves the best results. Considerable progresses have also been made in research on unsupervised speaker adaptation. Several different main approaches are presented to improve robustness against variations of speaking rate, speaking style and speaker characteristics. The methods described include new estimation of the parameters for vocal tract length normalization, features and codebook transformation methods using ML algorithms, and pronunciation adaptation of the words in the lexicon.

Fast Search for Large Vocabulary Speech Recognition

Stephan Kanthak, Achim Sixtus, Sirko Molau, Ralf Schlüter, and Hermann Ney

Lehrstuhl für Informatik VI, Computer Science Department, RWTH Aachen-University of Technology, Germany

Abstract. In this article we describe methods for improving the RWTH German speech recognizer used within the Verbmobil project. In particular, we present acceleration methods for the search based on both within-word and across-word phoneme models. We also study incremental methods to reduce the response time of the online speech recognizer. Finally, we present experimental off-line results for the three Verbmobil scenarios. We report on word error rates and real-time factors for both speaker independent and speaker dependent recognition.

Capturing Long Range Correlations Using Log-Linear Language Models

Jochen Peters and Dietrich Klakow

Philips GmbH Forschungslaboratorien, Aachen, Germany

Abstract. Written and spoken texts show long range correlations which are valuable for speech recognition systems. Unfortunately, these dependencies cannot be properly described by the widespread backing-off language models (LMs). This paper introduces basic concepts exploit long ranging correlations for the task of language modeling. Several approaches to get suitable LM structures are discussed and compared. The theoretical findings are fully con-Bflned by experiments performed on the spontaneous speech from the Verbmobil II domain and on written text from the Wallstreet Journal corpus.

Among the tested techniques to integrate the different sources of information the log-linear interpolation and the maximum entropy approach proved very effective. Perplexity reductlons—as compared to optimal backing-off LMs—of 8% are observed, and very com-pact models have been trained which still outperform the full, unpruned backing-off N-grams by 4%, at the same time reducing the LM size by 50% for trigrams and by 70% for fourgrams.

Data Driven Generation of Pronunciation Dictionaries

Matthias Eichner, Matthias Wolff, and Rüdiger Hoffmann

Laboratory of Acoustics and Speech Communication, Technische Universitat Dresden, Germany

Abstract. In the framework of the German Verbmobil project we developed a procedure for the automatic, data-driven generation of pronunciation dictionaries for speech recognition systems. In most recognizers only simple dictionaries containing the canonical pronuncia- tion form are used. They represent the correct pronunciation, but in most cases the canonical pronunciation does not match the actual realization of the word. To solve this problem we chose an approach to derive pronunciation variants automatically from a speech database. The training algorithm bases on a canonical dictionary which is compiled into a graph rep-presentation in a first stage. Pronunciation variants are then learned from a training sample consisting of speech signal and its orthographic transcription. In this paper we will focus on the experimental results obtained in the Verbmobil framework and introduce methods to evaluate pronunciation dictionaries generated by the training procedure.

The Prosody Module

Anton Batliner, Jan Buckow, Heinrich Niemann, Elmar Nöth, and Volker Warnke

Lehrstuhl für Mustererkennung, Universitat Erlangen-Nümberg, Germany

Abstract. We describe the acoustic-prosodic and syntactic-prosodic annotation and classification of boundaries, accents and sentence mood integrated in the Verbmobil system for the three languages German, English, and Japanese. For the acoustic-prosodic classification, a large feature vector with normalized prosodic features is used. For the three languages, a multilingual prosody module was developed that reduces memory requirement considerably, compared to three monolingual modules. For classification, neural networks and statistic language models are used.

The Recognition of Emotion

Anton Batliner¹, Richard Huber¹, Heinrich Niemann¹, Elmar Nöth¹, Jörg Spilker², and Kerstin Fischer³

¹Lehrstuhl für Mustererkenmmg, Universität Erlangen-Nümberg, Germany

²Lehrstuhl für Künstliche Intelligenz, Universität Erlangen-Nüirnberg, Germany

³Institut für Informatik, AB NatS, Universität Hamburg, Germany

Abstract. To detect emotional user behavior, particularly anger, can be very useful for successful automatic dialog processing. We present databases and prosodic classifiers implemented for the recognition of emotion in Verbmobil. Using a prosodic feature vector alone is, however, not sufficient for the modelling of emotional user behavior. Therefore, a module is described that combines several knowledge sources within an integrated classification of trouble in communication.

Processing Self-Corrections in a Speech-to-Speech System

Jörg Spilker, Martin Klarner, and Günther Görz

Chair for Artificial Intelligence, Department of Computer Science, Universität Erlangen-Nümberg, Germany

Abstract. Self-repairs are a frequent phenomenon in spontaneous speech. The ability to detect and correct those repairs is therefore indispensable for any spoken language system. We present a framework for detection and correction of speech repairs where all relevant levels of information, i.e., acoustics, lexis, syntax and semantics can be integrated. The basic idea is to reduce the search space for repairs as soon as possible by cascading filters that involve more and more features. At first an acoustic module generates hypotheses about the existence of a repair. Afterwards a stochastic model suggests a correction for every hypothesis. Highly scored corrections are inserted as new paths in the word lattice. Finally, a lattice parser decides whether the repair should be accepted or not.

Integrated Shallow Linguistic Processing

Ulrich Block and Tobias Ruland

Siemens AG, Corporate Technology, München, Germany

Abstract. This article gives an overview of the Integrated Processing module that realises the multi-parsing-engine for the Verbmobil parsing modules HPSG, statistical parsing, chunk parsing. The Integrated Processing module implements an A* search on a word hypotheses graph and interface functions to the different parsing approaches.

Probabilistic LR- Parsing with Symbolic Postprocessing

Tobias Ruland

Siemens AG, Corporate Technology, München, Germany

Abstract. This article describes a novel approach to probabilistic LR-parsing of spontaneously spoken utterances developed in Verbmobil. It extends the use of context knowledge within the probabilistic model of the parser and improves its output by applying tree transformation rules learned from corpora. The parser was developed for German, English and Japanese and achieves more than 90% Labeled Recall/Precision on parsed Verbmobil utterances.

Robust Chunk Parsing for Spontaneous Speech

Erhard W. Hinrichs, Sandra Kübler, Valia Kordoni, and Frank H. Müller

Seminar für Sprachwissenschaft, Abteilung Computerlinguistik, Eberhard-Karls-Universität Tübingen, Germany

Abstract. Chunk parsing (see Abney, 1991, and Abney, 1996) offers a particularly promising approach for robust, partial parsing with the goal of broad data coverage. A chunk parser is particularly well suited for an application for spontaneous speech since it can deal robustly with fragmentary or ill-formed input.

In order to guarantee the functionality that the Verbmobil system requires, wide-coverage finite-state grammars for the Verbmobil scenarios had to be constructed. In addition, several extensions to the basic chunk parsing technology had to be implemented in the TüSBL Tübingen Similarity Based Learning) system: the adaptation to input from the speech recognizers and to word incremental processing, and the construction of complete trees out of the chunk analysis.

TüSBL's tree construction algorithm relies on techniques from memory-based learning that allow similarity-based classification of a given input structure relative to a pre-stored set of tree instances from a fully annotated treebank.

Verbmobil Interface Terms (VITs)

Michael Schiehlen¹, Johan Bos², and Michael Dorna¹

Institute for Natural Language Processing (IMS), Universität Stuttgart, Germany Department of Computational Linguistics, Universität des Saarlandes, Saarbrücken, Germany

Abstract. This article describes the concepts and the contents of Verbmobil Interface Terms ): In VITs all linguistic information of an utterance relevant for translation is represented. They are used to provide an interface representation between several linguistic and dialog components of the Verbmobil system. Information in VITs is encoded in a record-like data structure. The fields are variable-free lists of non-recursive terms, so-called "flat" representations. They are filled with semantic, scopal, sortal, morpho-syntactic, prosodic, and discourse information. A labelling system is used to relate different kinds of information to each other. A library package realizing an abstract data type implements construction, access, update, check, print, etc. facilities for VITs.

Semantic Construction

Michael Schiehlen

IMS, Universität Stuttgart, Germany

Abstract. This article describes the concepts and the implementation of the semantic construction module (SemCon) used in the Verbmobil system. SemCon maps trees to Verbmobil Interface Terms (VITs). A main focus lies on robustness and underspecification. A minimalistic syntax-semantics interface is defined to support modularity. Diverse repair strategies are discussed to enhance robustness. With SemCon, it is possible to process large amounts of data and build semantic representations for a good part of the Verbmobil corpus.

Deep Linguistic Analysis with HPSG

Hans Uszkoreit¹, Dan Flickinger², Walter Kasper¹, and Ivan A. Sag²

¹DFKI GmbH, Saarbrücken, Germany

²Center for the Study of Language and Information (CSLI), Stanford University, USA

Abstract. Deep linguistic analysis is based on Head-Driven Phrase Structure Grammar (HPSG) which provides an integrated approach to syntactic and semantic analysis. We present the basic concepts and ideas of HPSG, as well as of the underlying semantic representation formalism and its interface to the Verbmobil system.

HPSG Analysis of German

Stefan Müller and Walter Kasper

DFK1 GmbH, Saarbrücken, Germany

Abstract. We present an overview of the HPSG grammar for the German deep analysis in Verbmobil. Especially, issues of using it for spontaneous speech processing in specific application domains will be discussed. Also, extra-linguistic information such as prosody has to be taken into account which is absent in written language. Finally, we present an empirical evaluation of the grammar with respect to the Verbmobil corpora.

HPSG Analysis of English

Dan Flickinger, Ann Copestake, and Ivan A. Sag

Center for the Study of Language and Information (CSLI), Stanford University, USA

Abstract. In this chapter we summarize the results of the HPSG English grammar project for analysis and generation in Verbmobil, housed at CSLI, Stanford University. After providing a description of the design and implementation of the grammar, we give an overview of the linguistic phenomena encountered in the Verbmobil domains, show the results of an evaluation of the grammar measured against transcribed spoken language data, and then point to next steps in the development of the grammar.

HPSG Analysis of Japanese

Melanie Siegel

Universität des Saarlandes, Saarbrücken, Germany

Abstract. A Japanese HPSG for deep analysis and generation in the Verbmobil system was developed. The focus point of the grammar is the processing of spontaneous Japanese dialogs. Therefore, the description of phenomena of spoken Japanese is central. We present some empirical evaluation of the grammar with Verbmobil corpora.

Efficient and Robust Parsing of Word Hypotheses Graphs

Bernd Kiefer, Hans-Ulrich Krieger, and Mark-Jan Nederhof

DFKI GmbH, Saarbrücken, Germany

Abstract. This paper describes the successful metamorphosis of Page from a string-based grammar development system to an efficient run time system, operating on word hypotheses graphs (WHGs). In particular, we report on the techniques we have applied to Page and which have resulted into a speed-up in parsing time of more than an order of magnitude. We elaborate how the system is interfaced to other components: WHG search, prosody detector, and robust semantic processing. We also present measurements for string and WHG parsing. The system as described in the paper has been applied in the speech translation project Verbmobil with large HPSG grammars for English, German, and Japanese.

Speech Lexica and Consistent Multilingual Vocabularies

Dafydd Gibbon and Harald Lüngen

Universität Bielefeld, Germany

Abstract. This contribution describes the theoretical foundations and lexical engineering procedures used in developing a common, consistent, linguistically and formally well-defined lexical database for all components of the Verbmobil speech-to-speech translation system.

Combining Analyses from Various Parsers

•Rupp¹, Jörg Spilker², Martin Klarner², and Karsten L. Worm¹

Department of Computational Linguistics, Universität des Saarlandes, Germany Computer Science Institute, Universität Erlangen-Nürnberg, Germany

Abstract. This chapter describes measures implemented in the semantics module to ensure that best use is made of the available linguistic analyses.

Robust Semantic Processing of Spoken Language

Manfred Pinkal, C.J. Rupp, and Karsten Worm

Department of Computational Linguistics, Universität des Saarlandes, Germany

Abstract. This chapter describes a novel strategy for the robust processing of spoken inputs in dialog translation systems. The implemented processor forms a major subcomponent of the semantics module in the Verbmobil system.

Discourse and Dialog Semantics for Translation

Johan Bos and Julia Heine

Department of Computational Linguistics, Universität des Saarlandes, Germany

Abstract. The Discourse and Dialog component in Verbmobil resolves non-local ambiguities, using knowledge provided by prosody and dialog acts, and the history of the ongoing dialog. It is a rule-based system, working on an ordered set of about 600 rules, dealing with phenomena found in English, German, and Japanese. The phenomena covered for English and German are lexical ambiguities, sentence mood determination, focus projection, and anaphora and ellipsis resolution. The disambiguation rules for Japanese include definiteness resolution, topic instantiation, and zero-anaphora resolution.

Multilingual Semantic Databases

Walter Kasper

DFKI GmbH, Saarbrücken, Germany

Abstract. To define the possible content of semantic representations for each language mantic databases were defined which provide the same types of information in a uniform way. A prerequisite is that the information types are meaningful across the different languages involved, thus a multilingual description system. We describe the use and structure of the databases. These provide not only interface specifications among the deep processing components in the system, but also provide a rich resource for describing semantic properties of lexical items in theory and implementation independent way.

Semantic-Based Transfer

Win C. Emele, Michael Dorna, Anke Lüdeling, and Heike Zinsmeister, and Christian Rohrer

Universität Stuttgart, Germany

Abstract. This article presents the concepts and the implementation of the semantic-based transfer approach used in the transfer component of the machine translation system Verbmobil. The transfer component acts as a rewriting system on enriched semantic representations. We show how the transfer formalism handles translation mismatches, structural divergences and other translation problems. If necessary, ambiguities are resolved by using the local input information or by inference results provided by other Verbmobil components. A system of macros and templates facilitates rule development. The transfer component consists of different cascaded sub-modules. The application of rules within the sub-modules is ordered automatically by specificity. The efficiency of the transfer component is illustrated by performance data.

Statistical Methods for Machine Translation

Stephan Vogel, Franz Josef Och, Christof Tillmann, Sonja Nießen, Hassan Sawaf, and Hermann Ney

Lehrstuhl für Informatik VI, Computer Science Department, RWTH Aachen-University of Technology, Germany

Abstract. In this article we describe the statistical approach to machine translation as implemented in the stattrans module of the Verbmobil system. The statistical translation approach uses two types of information: a translation model and a language model. The language model used is an m-gram model. The translation model comprises a stochastic lexicon and word position parameters. To capture dependencies between word groups in each of the two languages, alignment templates are used. We describe the components of the system and report results on the Verbmobil task. The experience obtained in the Verbmobil project shows that the statistical approach is very competitive with other translation approaches.

Adapting a Large Scale MT System for Spoken Language

Hans Ulrich Block, Stefanie Schachtl, and Manfred Gehrke

Siemens AG, Corporate Technology, München, Germany

Abstract. This paper describes an attempt to transform a general purpose machine translation system that had originally been designed for human aided computer translation of technical documentation into the linguistic component of a domain dependent spoken language translation system for remote PC maintenance. In the first part, the translation system is described. The second part describes the measures taken to adapt it to the spoken language task.

Example-Based Incremental Synchronous Interpretation

Hans Ulrich Block

Siemens AG, Corporate Technology, München, Germany

Abstract. This article describes a new approach to example based incremental translation for automatic interpretation systems developed in Verbmobil. The translation module is completely learned from a bilingual corpus. The training phase combines statistical word alignment with precomputation of translation "chunks" and contextual clustering of syntactic equivalence classes (word classes). The system gives incremental output for every piece input being it words or sequences of words. It thus tries to mimic the behaviour of a human synchronous interpreter. If a larger context leads to the need for reformulation the system utters a correction marker like I mean, and restarts the output from the starting position of the reformulation. The system is currently effective for German Û English. German Û Chinese and German Û Japanese are under construction. In the Verbmobil evaluation, this approach reached 79% of approximately correct translations on speech recognition output.

Example-Based Machine Translation with Templates

Marko Auerswald

DFKI GmbH, Kaiserslautern, Germany

Abstract. This paper presents an approach for template based machine translation, in which the templates are generated in a highly automated way from large corpora of translation examples. The techniques described have been successfully used in one of the alternative translation modules within the Verbmobil speech-to-speech translation system. A crucial feature of this approach is the capability of processing word lattice input in an efficient way.

Robust Content Extraction for Translation and Dialog Processing

Norbert Reithinger and Ralf Engel

DFKI GmbH, Saarbrücken, Germany

Abstract. The design rationale guiding the development of the reductionist dialog act based translation module in Verbmobil was robustness. Even in case the speech recognition or the prosodic processing does not perform perfectly, this module extracts and translates the main intentions and facts related to the domain. In a three step approach, first the dialog act describing the intention is computed using a statistical approach. The second step is the construction of the propositional content with robust hierarchical finite state transducers. For the definition of the transducers, knowledge sources available in Verbmobil are exploited. The resulting rep-resentation of these two steps is used in a template based finite state generator to realize the target language expressions. The internal representation is also communicated to the dialog module where it plays an important part in maintaining the dialog state.

Modeling Negotiation Dialogs

Jan Alexandersson¹, Ralf Engel¹, Michael Kipp¹, Stephan Koch², Uwe Küssner², Robert Reithinger¹, and Manfred Stede²

¹ DFKI GmbH, Saarbrücken, Germany

² Technische Universität Berlin, Germany

Abstract. For various purposes in the Verbmobil system it is necessary to build a full model of an unfolding dialog, on a suitably abstract level of representation. The basis of this model are representations of the individual utterances, and we capture their content by a combination dialog act and propositional content. Our hierarchy of dialog acts was used to annotate 21 CD-ROMs from the Verbmobil corpus, and the experience gained with the framework influenced standardization efforts in the international scientific community. On the side of propositional content, particular attention was given to the representation of temporal expressions, due to the application domains of Verbmobil.

Dialog Processing

Michael Kipp, Jan Alexandersson, Ralf Engel, and Norbert Reithinger

DFKI GmbH, Saarbrücken, Germany

Abstract. This chapter explains the major functionality of the dialog module in Verbmobil. Dialog knowledge is needed for context sensitive speech translation as well as for the automatic generation of dialog result summaries. Our component produces necessary structures for both purposes and stores them in a centrally accessible data repository—the dialog memory. The structures are based on robustly extracted shallow data which are corrected, extended and structured by our dialog processor. We use time and object completion algorithms to collect context data and compute inter-object relations to infer relevance for summarization. The resulting structures are used by the document generator for dialog minutes and summaries, and by the context evaluation module for translation disambiguation.

Contextual Disambiguation

Stephan Koch, Uwe Küssner, and Manfred Stede

Technische Universität Berlin, Germany

Abstract. Resolving ambiguities is a necessary step for machine translation aiming at high quality. In Verbmobil, with its specific conditions of speech-to-speech translation, contextual reasoning for purposes of disambiguation has to respect the particular conditions of being situated in a near-realtime system, and has to take errors in the speech recognition phase into account. This chapter describes the context evaluation module of Verbmobil's "deep processing" translation path. We characterize the linguistic phenomena that require contextual reasoning, describe the shape of our context representation, and explain how this representation is constructed during utterance interpretation, which involves performing the required disambiguations.

The Verbmobil Generation Component VM-GECO

Tilman Becker, Anne Kilger, Patrice Lopez, and Peter Poller

DFKI GmbH, Saarbrücken, Germany

Abstract. This chapter presents the Verbmobil generation component VM-GECO. The main modules of our component—microplanner and syntactic generator—are illuminated in detail focusing on the problems of real-time computation, multilinguality, dependencies among choices and the use of different representation formalisms. We discuss robustness as an important feature of large-scale systems with spontaneous and erroneous input.

The Application of HPSG-to-TAG Compilation Techniques

Tilman Becker and Patrice Lopez

DFKI GmbH, Saarbrücken, Germany

Abstract. The HPSG-to-TAG compilation algorithm proposed in Kasper et al. (1995) has been the basis of large scale experiments in Verbmobil. The results presented here refer concentrate on the English HPSG grammar developed at CSLI. Several non-trivial theoretical problems have been discovered by the practical application of this algorithm. This paper presents these experiments, the main shortcomings of the initial algorithm and some of the solutions we have developed in order to use the resulting compiled LTAG (Lexicalized TAG) grammar in a real world system.

Generating Multilingual Dialog Summaries and Minutes

Jan Alexandersson, Peter Poller, and Michael Kipp

DFKI GmbH, Saarbrücken, Germany

Abstract. This chapter describes the on-demand generation of dialog minutes and result summaries of dialogs. We focus on summary generation since the generation of minutes is performed using almost the same techniques. We describe how the relevant data are selected from the dialog memory, how the data are converted into sequences of VITs and, finally, WC demonstrate how the existing generation module of Verbmobil was extended to generate textual documents. Multilinguality is achieved by utilizing the transfer module.

Speech Synthesis Using Multilevel Selection and Concatenation of Units from Large Speech Corpora

Karlheinz Stober¹, Petra Wagner¹, Jörg Helbig², Stefanie Köster³, David Stall⁴, Matthias Thomae⁴, Jens Blauert³, Wolfgang Hess¹, Rüdiger Hoffmann², and Helmut Mangold⁴

¹ Institut für Kommunikationsforschung und Phonetik, Universität Bonn, Germany

² Institut für Akustik und Sprachkommunikation, Technische Universität Dresden, Germany

³Institut für Kommunikationsakustik, Ruhr-Universität Bochum, Germany

⁴DaimlerChrysler AG, Research and Technology, Ulm, Germany

Abstract. This paper describes the Verbmobil speech synthesis: the segmental and prosodic transcription on the symbolic level, the construction of the synthesis corpus, the algorithm for selecting synthesis units out of this corpus, and the adaptation of the resulting synthetic speech to the relevant dialog situation and individual speaker.

Verbmobil Data Collection and Annotation

Susanne Burger¹, Karl Weilhammer², Florian Schiel³, and Hans G. Tillmann²

¹Interactive Systems Laboratories, Universität Karlsruhe, Germany, and Carnegie Mellon
University, Pittsburgh, PA, USA

²Department of Phonetics, LMU München, Germany

³Bavarian Archive for Speech Signals, LMU München, Germany

Abstract. Verbmobil data collection had to satisfy the different requirements for data quality and annotation level for each project partner. This chapter describes the different user groups, their data demands and how the data collection group solved these issues.

The Tübingen Treebanks for Spoken German, English, and Japanese

Erhard W. Hinrichs, Julia Bartels, Yasuhiro Kawata, and Valia Kordoni, and Heike Telljohann

Seminar für Sprachwissenschaft, Abt. Computerlinguistik, Eberhard-Karls Universität Tübingen, Germany

Abstract. The Tübingen treebanks for spoken German, English and Japanese provide linguistic annotations for the Verbmobil dialog corpus of spontaneous speech in the scenarios of appointment negotiations, travel arrangements and personal computer maintenance. The annotation schemes of the Tübingen treebanks have been developed taking into account the specific characteristics of spoken language dialogs: repetitions, hesitations, "false starts", etc.

Multilingual Verbmobil-Dialogs: Experiments, Data Collection and Data Analysis

Susanne J. Jekat and Walther v. Hahn

Computer Science Department, Natural Language Systems Division and SFB 538 Multilin-gualism, Universität Hamburg, Germany

Abstract. In this article we describe the collection and analysis of multilingual dialogs with a human or machine interpreter within the Verbmobil framework. As the dialogs represent very rare speech data with high acoustic quality, analysis is still in progress and further research is ongoing.

Speech Recognition Performance Assessment

Michael Malenke, Marcus Bäumler, and Erwin Paulus

Institute for Communications Technology, Technische Universität Braunschweig, Germany

Abstract. From 1998 to 2000 the performance of the three speech recognition modules of the Verbmobil system has been evaluated at regular intervals. The principal concepts and main results of the evaluations are presented with some stress put on the final evaluation in 2000.

Speech Synthesis Quality Assessment

Jochen Steffens and Erwin Paulus

Institute for Communications Technology, Braunschweig, Germany

Abstract. Category rating tests have been performed in order to compare the Verbmobil speech synthesis module to several commonly available speech synthesis techniques as well as to natural speech. The Verbmobil speech synthesis module applies a corpus-based selection and concatenation technique, and as regards the quality of synthesized utterances in German, appears to be superior to other synthesis techniques. For American English it is also among the best, but is not yet as dominant as it is for German. This seems to be due to the fact that there has been considerable efforts in tuning the German part of the corpus to the Verbmobil domain, while the American English part of the corpus at the time of the evaluation had not yet reached a comparably mature state.

From Off-line Evaluation to On-line Selection

Damir Ćavar, Uwe Küssner, and Dan Tidhar

Technische Universität Berlin, Germany

Abstract. In order to meet the challenges set by the innovative multi-engine translation architecture, an additional selection component is necessary. The selection component fulfills the task of integrating the various alternative translations that are produced for each input utterance, and comes up with exactly one optimal translation. In the center of this chapter is a learning method that was tailored to overcome the problem of incomparable confidence values delivered by the competing translation paths, thus enabling the selection component to rely on confidence values as the main selection criterion. By using off-line human feedback and applying a linear optimization heuristic, we determine a rescaling scheme that enables us to compare confidence values across modules. We also describe some additional information sources that further elaborate the selection procedure, and finally, outline some Quality of Service parameters that are supported by the selection module.

Functional Validation of a Machine Interpretation System: Verbmobil

Lorenzo Tessiore and Walther v. Hahn

Department of Computer Science, Universität Hamburg, Germany

Abstract. Evaluation of NLP systems is on its way to a deep and detailed standardization. Methods and techniques are developed, but only for the evaluation of ready-to-sell products; the evaluation of a system that is still under development is not standardized and not even ad hoc tools are available for this purpose. The evaluation of Verbmobil required the development of an adequate evaluation technique and a tool that could deal both with the need to validate the system as a quasi-product and to produce useful feedback to the developers for further improvement of the system. This paper explains the methodological and technical choices that led to the implementation of a graphic evaluation tool (GET), discusses the GET and shows the results that have been gathered by its use. The paper includes a discussion of the complex problem of evaluating translations.

Verbmobil From a Software Engineering Point of View: System Design and Software Integration

Andreas Klüter, Alassane Ndiaye, and Heinz Kirchmann

DFKI GmbH, Kaiserslautern, Germany

Abstract. The distributed research and software development in Verbmobil resulted in an integrated speech-to-speech translation system. The size of the project, the heterogeneous environment at the various development sites and the constraint of software reuse required professional software engineering for successful integration. For this purpose, a software design and integration group was established. This article describes the software engineering strategies applied within Verbmobil. We discuss the prerequisites necessary for successful integration, describe the software framework provided by the system group, show how modules communicate and how integrations were performed. We also discuss design decisions and show that the concepts and the integration framework are not limited to speech-to-speech translation systems, but are also applicable to any large scale distributed software development project.

From a Stationary Prototype to Telephone Translation Services

Heinz Kirchmann, Alassane Ndiaye, and Andreas Klüter

DFKI GmbH, Kaiserslautern, Germany

Abstract. In addition to the face-to-face system, Verbmobil has been extended to offer translation services via telephone. The implementation of the telephone system required some prerequisites which influence the whole system design. Modeling the user guidance had to take into account the lack of visual feedback. This article describes the general differences between the stationary and the telephone scenario and how Verbmobil was adapted to the challenges of a telephone translation server. We also show configurations and possible applications of the speech-to-speech translation telephone server.