Government MT Users: AMTA 2010

Government MT Users Program

Submitted by: National Virtual Translation Center (NVTC)

Speaker: Carol Van Ess-Dykema and Flo Reeder

Topic: Paralinguist Assessment Decision Factors For Multi-Engine Machine Translation Output

Describes a study that looks at whether Machine Translation (MT) enables translators to translate faster while at the same time producing better quality translations than without MT. It examines developers’ automatic metrics correlation with the ability to post-edit a text by a human translator. This study also seeks to find decision factors that enable a translation professional, known as a Paralinguist, to determine if MT output is of sufficient quality to serve as a “seed translation” for translators. These decision factors, unlike developers’ metrics, must function without a reference translation. This is the report of that study.

The study consists of two investigations. The first investigation answers the question: Can we post-edit MT produced “seed translations” while increasing translator speed and accuracy? The first step is to machine translate candidate texts, selected on the basis of subjects and genres. Then, translators are asked to post-edit the output of the MT and their words per hour translation rate are measured. Then, the post-edited MT output is assessed by quality control personnel using a US Government assessment standard. Analysis occurs, comparing translator speed and accuracy for test and control conditions. Additionally, developers’ metrics for MT are compared with translator words per hour; translators’ opinion of the post-editing activity and the quality control score.

The second investigation answers the question: Which decision factors aid a paralinguist in determining whether MT output is post-editable? The second investigation starts with the MT output from the first investigation and utilizes the scores determined during that investigation. Then, candidate decision factors are analyzed for correlation with translator words per hour; translators’ opinion of the post-editing activity and the quality control score. A Paralinguist will not have the benefit of a reference translation to use these metrics. Therefore, part of this study is a search for easily calculated metrics that do not require a reference translation yet yield indicators about a document’s suitability for post-editing.

Submitted by: Language Weaver, SDL

Speaker: Daniel Marcu

Contributors: Dr. Kathleen Egan, Chuck Simmons, Ning-Ning Mahlmann

Topic: Utilizing Automated Translation with Quality Scores to Increase Productivity

Automated translation can assist with a variety of translation needs in government, from speeding up access to information for intelligence work to helping human translators increase their productivity. However, government entities need to have a mechanism in place so that they know whether or not they can trust the output from automated translation solutions.

In this presentation, Language Weaver will present a new capability – TrustScore – an automated scoring algorithm that communicates how good the automated translation is, using a meaningful metric. With this capability, each translation is automatically assigned a score from 1 to 5 – the TrustScore. A score of 1 would indicate that the translation is unintelligible; a score of 3 would indicate that meaning has been conveyed and that the translated content is actionable. A score approaching 4 or higher would indicate that meaning and nuance have been carried through. This automatic prediction of quality has been validated by testing done across significant numbers of data points in different companies and on different types of content.

After outlining TrustScore, and how it works, Language Weaver will discuss how a scoring mechanism like TrustScore could be used in a translation productivity workflow in government to assist linguists with day to day translation work. This would enable them to further benefit from their investments in automated translation software. Language Weaver would also share how TrustScore is used in commercial deployments to cost effectively publish information in near real time.

Submitted by: United Nations Translation Services

Speaker: Li Zuo

Topic: Machine translation from English to Chinese: A study of Google’s performance with the UN documents.

The present study examines from users' perspective the performance of Google’s online translation service on the documents of the United Nations. Since at least 2004, United Nations has been exploring, piloting, and implementing computer assisted translation (CAT) with Trados as an officially selected vehicle. A more recent development is the spontaneous adoption of Google translation among Chinese translators as an easy, versatile, and labor-saving tool. With machine translation getting real among developers and end-users, there seems to be a need to conduct a reality check to see how well it serves its purpose. The current study examines Google translation and its degree of assistance to the Chinese professional translators at the United Nations in particular. It uses a variety of UN documents to test and evaluate the performance of Google translation from English to Chinese. The sampled UN documents consist of 3 resolutions, 2 letters, 2 provisional agendas, 1 plenary verbatim, 1 report, 1 note by the Secretariat, and 1 budget.

The results vindicate Google’s cutting edge in machine translation when English to Chinese is concerned, thanks to its powerful infrastructure and immense translation database. The conversion between the two languages takes only an instant, even for a fairly long piece. On top of that, Google gets terminology right more frequently and seems better able to make an intelligent guess when compared with other translation tools like MS Bing. But Google’s Chinese is far from intelligible, especially at the sentence level, primarily because of serious problems with word order and sentence parsing. There are also technical problems like adding or omitting words and erroneous rendering of numbers.

Nevertheless, Google translation offers translators an option to work on its rough draft for the benefit of saving time and pain in typing. The challenges of post-editing, however, may offset the time saved. Even though Google translation may not necessarily net in speed gains when it is used to assist translation, it certainly is a beneficial labor saver, including mental labor when it performs at its very best.

Submitted by: National Air and Space Intelligence Agency

Presenter: Chuck Simmons

Topic: Foreign Media Collaboration Framework (FMCF)

The Foreign Media Collaboration Framework (FMCF) is the latest approach by NASIC to provide a comprehensive system to process foreign language materials. FMCF is a Services Oriented Architecture (SOA) that provides an infrastructure to manage HLT tools, products, workflows, and services. This federated SOA solution adheres to DISA’s NCES SOA Governance Model, DDMS XML for Metadata Capture/Dissemination, and IC-ISM for Security.

The FMCF provides a cutting edge infrastructure that encapsulates multiple capabilities from multiple vendors in one place. This approach will accelerate HLT development, contain sustainment cost, minimize training, and brings the MT, OCR, ASR, audio/video, entity extraction, analytic tools and database under one umbrella, thus reducing the total cost of ownership.

Submitted by: Technical Support working Group

Presenter: Kathleen Egan

Topic: Cross Lingual Arabic Blog Alerting (COLABA)

Social media and tools for communication over the Internet have expanded a great deal in recent years. This expansion offers a diverse set of users a means to communicate more freely and spontaneously in mixed languages and genres (blogs, message boards, chat, texting, video and images). Dialectal Arabic is pervasive in written social media, however current state of the art tools made for Modern Standard Arabic (MSA) fail on Arabic dialects.

COLABA enables MSA users to interpret dialects correctly. It helps find Arabic colloquial content that is currently not easily searchable and accessible to MSA queries.

The COLABA team has built a suite of tools that will offer users the ability to anonymously capture online unstructured media content from blogs to comprehend, organize, and validate content from informal and colloquial genres of online communication in MSA and a variety of Arabic dialects

The DoD/Combating Terrorism Technical Support Office/Technical Support Working Group (CTTSO/TSWG) awarded the contract to Acxiom Corporation and partners from MTI/IBM, Columbia University, Janya and Wichita State University to bring joint expertise to address this challenge.

The suite has several use applications:

· Support for language and cultural learning by making colloquial Arabic intelligible to students of MSA

· Retrieval and prioritization for triage and content analysis by finding Arabic colloquial and dialect terms that today’s search engines miss; by providing appropriate interpretations of colloquial Arabic, which is opaque to current analytics approaches; and by Identify named entities, events, topics, and sentiment.

Enabling improved translations by MSA-trained MT systems through decreases in out-of-vocabulary terms achieved by means of colloquial term conversion to MSA.

Submitted by: National Air and Space Intelligence Agency

Presenter: Weimin Jiang

Topic: Pre-editing for Machine Translation

It is common practice that linguists will do MT post-editing to improve translation accuracy and fluency. This presentation however, examines the importance of pre-editing source material to improve MT. Even when a digital source file which is literally correct is used for MT, there are still some factors that have significant effect on MT translation accuracy and fluency.

Based on 35 examples from more than 20 professional journals and websites, this article is about an experiment of pre-editing source material for Chinese-English MT in the S&T domain. Pertinent examples are selected to illustrate how machine translation accuracy and fluency can be enhanced by pre-editing which includes the following four areas: to provide a straightforward sentence structure, to improve punctuation, to use straightforward wording, and to eliminate redundancy and superfluous elements.

Submitted by: Basis Technology Inc.

Speaker: Brian Roberson

Topic: Multi-Language Desktop Suite

Professional language analysts leverage a myriad of tools in their quest to produce accurate translations of foreign language material. The effectiveness of these tools ultimately affects resource allocation, information dissemination and subsequent follow-on mission planning – all three of which are vital, time-critical components in the intelligence cycle.

This presentation will highlight the need for interactive tools that perform jointly in an operational environment, focusing on a dynamic suite of foreign language tools packaged into a desktop application and serving in a machine translation role.

Basis Technology’s Arabic/Afghan Desktop Suite (ADS) supports DOMEX, CELLEX, and HUMINT missions while being the most powerful Arabic, Dari and Pushto text analytic and processing software available. The ADS translates large scale lists of names from foreign language to English and also pinpoints place names appearing in reports with their coordinate locations on maps.

With standardization output having to be more accurate than ever, the ADS ensures conformance with USG transliteration standards for Arabic script languages, including IC, BGN/PCGN, SATTS and MELTS. The ADS enables optimization of your limited resources and allows your analysts and linguists to be tasked more efficiently throughout the workflow process.

Submitted by: CACI Inc. and Apptek

Presenter: Kristen Summers and Hassan Sawaf

Topic: User-generated System for Critical Document Triage and Exploitation–Version 2011

CACI has developed and delivered systems for document exploitation and processing to Government customers around the world. Many of these systems include advanced language processing capabilities in order to enable rapid triage of vast collections of foreign language documents, separating the content that requires immediate human attention from the less immediately pressing material.

AppTek provides key patent-pending Machine Translation technology for this critical process, rendering material in Arabic, Farsi and other languages into an English rendition that enables both further automated processing and rapid review by monolingual analysts, to identify the documents that require immediate linguist attention.

Both CACI and AppTek have been working with customers to develop capabilities that enable them, the users, to be the ones in command of making their systems learn and continuously improve. We will describe how we put this critical user requirement into the systems and the key role that the user’s involvement played in this.

We will also discuss some of the key components of the system and what the customer-centric evolution of the system will be, including our document translation workflow, the machine translation technology within it, and our approaches to supporting the technology and sustaining its success designed around adapting to users’ needs.

Submitted by: AMTA Government Track Organizers

Panel Moderator: Judith L. Klavans

Topic: Task-based evaluation methods for machine translation, in practice and theory

A panel of industry and government experts will discuss ways in which they have applied task-based evaluation for Machine Translation and other language technologies in their organizations and share ideas for new methods that could be tried in the future. As part of the discussion, the panelists will address some of the following points:

· What task-based evaluation means within their organization, i.e., how task-based evaluation is defined

· How task-based evaluation impacts the use of MT technologies in their work environment

· Whether task-based evaluation correlates with MT developers’ automated metrics and if not, how do we arrive at automated metrics that do correlate with the more expensive task-based evaluation

· What "lessons-learned" resulted from the course of performing task-based evaluation

· How task-based evaluations can be generalized to multiple workflow environments

Submitted by: MITRE

Presenter: Rod Holland

Topic: Exploring the AFPAK Web

In spite of low literacy levels in Afghanistan and the Tribal Areas of Pakistan, the Pashto and Dari regions of the World Wide Web manifest diverse content from authors with a broad range of viewpoints. We have used cross-language information retrieval (CLIR) with machine translation to explore this content, and present an informal study of the principal genres that we have encountered. The suitability and limitations of existing machine translation packages for these languages for the exploitation of this content is discussed.

Submitted by: Raytheon BBN Technologies

Presenter: Sean Colbath

Topic: Terminology Management for Web Monitoring

Current state-of-the-art in speech recognition, machine translation, and natural language processing (NLP) technologies has allowed the development of powerful media monitoring systems that provide today’s analysts with automatic tools for ingesting and searching through different types of data, such as broadcast video, web pages, documents, and scanned images.

However the core human-language technologies (HLT) in these media monitoring systems are static learners, which mean that they learn from a pool of labeled data and apply the induced knowledge to operational data in the field. To enable successful and widespread deployment and adoption of HLT, these technologies need to be able to adapt effectively to new operational domains on demand.

To provide the US Government analyst with dynamic tools that adapt to these changing domains, these HLT systems must support customizable lexicons. However, the lexicon customization capability in HLT systems presents another unique challenge especially in the context of multiple users of typical media monitoring system installations in the field. Lexicon customization requests from multiple users can be quite extensive, and may conflict in orthographic representation (spelling, transliteration, or stylistic consistency) or in overall meaning. To protect against spurious and inconsistent updates to the system, the media monitoring systems need to support a central terminology management capability to collect, manage, and execute customization requests across multiple users of the system.

In this talk, we will describe the integration of a user-driven lexicon/dictionary customization and terminology management capability in the context of the Raytheon BBN Web Monitoring System (WMS) to allow intelligence analysts to update the Machine Translation (MT) system in the WMS with domain- and mission-specific source-to-English phrase translation rules. The Language Learning Broker (LLB) tool from the Technology Development Group (TDG) is a distributed system that supports dictionary/terminology management, personalized dictionaries, and a workflow between linguists and linguist management.

LLB is integrated with the WMS to provide a terminology management capability for users to submit, review, validate, and manage customizations of the MT system through the WMS User Interface (UI). We will also describe an ongoing experiment to measure the effectiveness of this user-driven customization capability, in terms of increased translation utility, through a controlled experiment conducted with the help of intelligence analysts.

Submitted by: Defense Intelligence Agency

Presenter: Nicholas Bemish

Topic: Use of HLT tools within the US Government

In today’s post 9/11 world, the need for qualified linguists to process all the foreign language materials that are collected/confiscated overseas and at home has grown considerably. To date, a gap exists in the number of linguists needed to process all this material. To fill this gap, the government has invested in the research, development and implementation of Human Language Technologies into the linguist workflow.

Most of the current DOMEX workflows incorporate HLT tools, whether that is Machine Translation, Named Entity Extraction, Name Normalization or Transliteration tools. These tools aid the linguists in processing and translating DOMEX material, cutting back on the amount of time needed to sift through all the material.

In addition to the technologies used in workflow processes, we have also implemented tools for intelligence analysts, such as the Broadcast Monitoring System and Tripwire. These tools allow non-language qualified analysts to search through foreign language material and exploit that material for intelligence value. These tools implement such technologies as Speech-to-text and machine translation.

Part of this effort to fill the gap in the ability to process all this information has been collaboration amongst the members of the Intelligence Community on the research and development of tools. This type of engagement allows the government to save time and money in eliminating the duplication of efforts and allows government agencies to share their ideas and expertise.

Our presentation will address some of the tools that are currently in use throughout DoD; being considered for use; some of the challenges we face; and how we are making best use of the HLT development and research that is supporting our needs.

Submitted by: National Research Council of Canada

Presenter: Alain Désilets

Topic: WeBiText: Multilingual Concordancer Built from Public High Quality Web Content

In this paper, we describe WeBiText (www.webitext.ca) and how it is being used. WeBiText is a concordancer that allows translators to search in large, high-quality multilingual web sites, in order to find solutions to translation problems. After a quick overview of the system, we present results from an analysis of its logs, which provides a picture of how the tool is being used and how well it performs. We show that it is mostly used to find solutions for short, two or three word translation problems.

The system produces at least one hit for 58% of the queries, and hits from at least five different web pages in 41% of cases. We show that 36% of the queries correspond to specialized language problems, which is much higher than what was previously reported for a similar concordancer based on the Canadian Hansard (TransSearch). We also provide a back of the envelope calculation of the current economic impact of the tool, which we estimate at $1 million per year, and growing rapidly.

Submitted by: MITRE

Presenter: Stacey Bailey

Topic: Data Preparation for Machine Translation Customization

The presentation will focus on ongoing work to develop sentence-aligned Chinese-English data for machine translation customization. Fully automatic alignment produces noisy data (e.g., containing OCR and alignment errors), and we are looking at the question of just how noisy noisy data can be and still produce translation improvements. Related, data clean-up efforts are time- and labor-intensive and we are examining whether translation improvements justify the clean-up costs.

Submitted by: Northrop Grumman Corporation

Presenter: Michael Ladwig

Topic: Language NOW

Language Now is a natural language processing (NLP) research and development

(R&D) program with a goal of improving the performance of machine translation (MT) and other NLP technologies in mission-critical applications. The Language NOW R&D program has produced the following four primary advances as Government license-free technology:

- A consistent and simple user interface developed to allow non-technical users, regardless of language proficiency, to use NLP technology in exploiting foreign language text content. Language NOW research has produced first-of-a-kind capabilities such as detection and handling of structured data, direct processing and visualization of foreign language data with transliterations and translations

- A highly efficient NLP integration framework, the Abstract Scalable Language Services (ASLS). ASLS offers system developers easy implementation of an efficient integrated service oriented architecture suitable for devices ranging from handheld computers to large enterprise computer clusters

- Service wrappers integrating commercial, Government license-free, open source and research software that provide NLP services such as machine translation, named entity recognition, optical character recognition (OCR), transliteration and text search

- STatistical Engines for Language Analysis (STELAE) and Maximum Entropy Extraction Pipeline (MEEP) tools that produce customized statistical machine translation and hybrid statistical/rule-based named entity recognition engines.

Submitted by: The Technology Development Group

Presenter: Mike O’Malley

Topic: The Challenges of Distributed Parallel Corpora

Parallel corpora have traditionally been created, maintained and disseminated by translators and analysts addressing specific domains. They grow by aggregation, individual contributions taking residence in the knowledge base. While the provenance of these new terms is known, their validity is not; they must be vetted by domain and language experts in order to be considered for use in the translation process. In order to address the evolving ecosphere surrounding parallel corpora, developers and analysts need to move beyond the data limitations of the static model.

This traditional model does not fully take advantage of new infiltration and exfiltration datapaths available in today’s world of distributed knowledge bases. Incoming data are no longer simply textual- audio, imagery and video are all critical components in corpora utility. Corpora maintainers have access to these media types through a variety of data sources, such as automated media monitoring services, the output of any number of translation environments, and translation memory exchanges (TMXs) developed by domain and language experts. These input opportunities are often pre-vetted and ready for automated inclusion into the parallel corpora; their content should not be reduced to the strictly textual. Unfortunately, the quality of the automated alignment and segmentation systems used in these automated systems remains a concern for the bulk preprocessing needed for downstream systems.

These data sources share a common characteristic, that of known provenance. They are typically a vetted source and a regular provider to the parallel corpora, whether via daily newscasts or other means. Other data sources are distributed in nature and thus offer distinct challenges to the collection, vetting and exploitation processes. One of the most exciting of such an infiltration path is crowdsourcing. A next-generation parallel corpora management system must be capable of, if not necessarily automatically incorporating crowdsourced terminology as a vetted source, facilitating manual inclusion of vetted crowdsourced terminology. This terminology may be submitted in any scale from practically any source. It may overlap or be contradictory- it almost certainly will require some degree of analysis and evaluation before inclusion. Fortunately, statistical analysis techniques are available to mitigate these concerns. One significant benefit to a crowdsourcing approach is the gains in alignment and segmentation accuracy over similar products offered by the automated systems mentioned above. Given the scalability of crowdsourcing methods, it is certainly a viable framework for bulk alignment and segmentation.

Another consideration for the development of distributed parallel corpora systems is their position in the translation workflow. The outputs and exfiltration paths of such a system can be as used for such diverse purposes as addition to existing TMXs, refinement of existing MT applications (through either improvement of their learning processes or inclusion of parallel-corpora generated domain-specific lexicons), creation of sentence pairs and other products for language learning system (LLS) systems, and support for exemplar language clips such as those developed by the State Department.

Submitted by: National Air and Space Agency

Presenter: William McIntyre

Topic: Translation of Chinese Entities in Russian Text

This briefing addresses the development of a conversion table that will enable a translator to render Chinese names, locations, and nomenclature into proper Pinyin. As a rule, Russian Machine Translation is a robust system that provides good results. It is a mature system with extensive glossaries and can be useful for translating documents across many disciplines.

However, as a result of the transliteration process, Russian MT will not convert Chinese terms from Russian into the Pinyin standard. This standard is used by most databases and the internet. Currently the MT software is performing as it was designed, but this problem impacts the accuracy of the MT making it almost useless for many purposes including data retrieval.

Submitted by: National Virtual Translation Center

Presenter: Carol Van Ess-Dykema and Laurie Gerber

Topic: Parallel Corpus Development at NVTC

In this paper, we describe the methods used to develop an exchangeable translation memory bank of sentence-aligned Mandarin Chinese - English sentences. This effort is part of a larger effort, initiated by the National Virtual Translation Center (NVTC), to foster collaboration and sharing of translation memory banks across the Intelligence Community and the Department of Defense.

In this paper, we describe our corpus creation process – a largely automated process – highlighting the human interventions that are still deemed necessary. We conclude with a brief discussion of how this work will affect plans for NVTC’s new translation management workflow and future research to increase the performance of the automated components of the corpus creation process.