Government
MT Users Program
Submitted by: National Virtual Translation Center (NVTC)
Speaker: Carol Van Ess-Dykema and Flo Reeder
Topic: Paralinguist Assessment Decision Factors For
Multi-Engine Machine Translation Output
Describes a study that looks at whether Machine
Translation (MT) enables translators to translate faster while at the same time
producing better quality translations than without MT. It examines developers’ automatic metrics
correlation with the ability to post-edit a text by a human translator. This study also seeks to find decision
factors that enable a translation professional, known as a Paralinguist, to
determine if MT output is of sufficient quality to serve as a “seed
translation” for translators. These
decision factors, unlike developers’ metrics, must function without a reference
translation. This is the report of that
study.
The study consists of two investigations. The first investigation answers the
question: Can we post-edit MT produced
“seed translations” while increasing translator speed and accuracy? The first step is to machine translate
candidate texts, selected on the basis of subjects and genres. Then, translators are asked to post-edit the
output of the MT and their words per hour translation rate are measured. Then, the post-edited MT output is assessed
by quality control personnel using a US Government assessment standard. Analysis occurs, comparing translator speed
and accuracy for test and control conditions.
Additionally, developers’ metrics for MT are compared with translator
words per hour; translators’ opinion of the post-editing activity and the
quality control score.
The second investigation answers the question: Which decision factors aid a paralinguist in
determining whether MT output is post-editable?
The second investigation starts with the MT output from the first
investigation and utilizes the scores determined during that
investigation. Then, candidate decision
factors are analyzed for correlation with translator words per hour;
translators’ opinion of the post-editing activity and the quality control
score. A Paralinguist will not have the
benefit of a reference translation to use these metrics. Therefore, part of this study is a search for
easily calculated metrics that do not require a reference translation yet yield
indicators about a document’s suitability for post-editing.
Submitted by: Language Weaver, SDL
Speaker: Daniel Marcu
Contributors: Dr. Kathleen Egan, Chuck Simmons, Ning-Ning
Mahlmann
Topic: Utilizing Automated Translation with Quality
Scores to Increase Productivity
Automated translation can assist with a variety of
translation needs in government, from speeding up access to information for
intelligence work to helping human translators increase their
productivity. However, government entities need to have a mechanism in
place so that they know whether or not they can trust the output from automated
translation solutions.
In this presentation, Language Weaver will present a new
capability – TrustScore – an automated scoring algorithm that communicates how
good the automated translation is, using a meaningful metric. With this
capability, each translation is automatically assigned a score from 1 to 5 –
the TrustScore. A score of 1 would indicate that the translation is
unintelligible; a score of 3 would indicate that meaning has been conveyed and
that the translated content is actionable. A score approaching 4 or
higher would indicate that meaning and nuance have been carried through.
This automatic prediction of quality has been validated by testing done across
significant numbers of data points in different companies and on different
types of content.
After outlining TrustScore, and how it works, Language
Weaver will discuss how a scoring mechanism like TrustScore could be used in a
translation productivity workflow in government to assist linguists with
day to day translation work. This would enable them to further benefit
from their investments in automated translation software. Language Weaver
would also share how TrustScore is used in commercial deployments to cost
effectively publish information in near real time.
Submitted by: United Nations Translation Services
Speaker: Li Zuo
Topic: Machine translation from English to Chinese:
A study of Google’s performance with the UN documents.
The present study
examines from users' perspective the performance of Google’s online translation
service on the documents of the United Nations. Since at least 2004, United
Nations has been exploring, piloting, and implementing computer assisted
translation (CAT) with Trados as an officially selected vehicle. A more recent
development is the spontaneous adoption of Google translation among Chinese
translators as an easy, versatile, and labor-saving tool. With machine
translation getting real among developers and end-users, there seems to be a
need to conduct a reality check to see how well it serves its purpose. The
current study examines Google translation and its degree of assistance to the
Chinese professional translators at the United Nations in particular. It uses a
variety of UN documents to test and evaluate the performance of Google
translation from English to Chinese. The sampled UN documents consist of 3
resolutions, 2 letters, 2 provisional agendas, 1 plenary verbatim, 1 report, 1
note by the Secretariat, and 1 budget.
The results vindicate Google’s cutting edge in machine
translation when English to Chinese is concerned, thanks to its powerful infrastructure
and immense translation database. The conversion between the two languages
takes only an instant, even for a fairly long piece. On top of that, Google
gets terminology right more frequently and seems better able to make an
intelligent guess when compared with other translation tools like MS Bing. But
Google’s Chinese is far from intelligible, especially at the sentence level,
primarily because of serious problems with word order and sentence parsing.
There are also technical problems like adding or omitting words and erroneous
rendering of numbers.
Nevertheless, Google translation offers translators an
option to work on its rough draft for the benefit of saving time and pain in
typing. The challenges of post-editing, however, may offset the time saved.
Even though Google translation may not necessarily net in speed gains when it
is used to assist translation, it certainly is a beneficial labor saver,
including mental labor when it performs at its very best.
Submitted by: National Air and Space Intelligence Agency
Presenter: Chuck Simmons
Topic: Foreign Media Collaboration Framework (FMCF)
The Foreign Media Collaboration Framework (FMCF) is the
latest approach by NASIC to provide a comprehensive system to process foreign
language materials. FMCF is a Services
Oriented Architecture (SOA) that provides an infrastructure to manage HLT
tools, products, workflows, and services. This federated SOA solution adheres
to DISA’s NCES SOA Governance Model, DDMS XML for Metadata Capture/Dissemination,
and IC-ISM for Security.
The FMCF provides a cutting edge infrastructure that
encapsulates multiple capabilities from multiple vendors in one place. This approach will accelerate HLT
development, contain sustainment cost, minimize training, and brings the MT,
OCR, ASR, audio/video, entity extraction, analytic tools and database under one
umbrella, thus reducing the total cost of ownership.
Submitted by: Technical Support working Group
Presenter: Kathleen Egan
Topic: Cross Lingual Arabic Blog Alerting (COLABA)
Social media and tools for
communication over the Internet have expanded a great deal in recent years.
This expansion offers a diverse set of users a means to communicate more freely
and spontaneously in mixed languages and genres (blogs, message boards, chat,
texting, video and images). Dialectal Arabic is pervasive in written social
media, however current state of the art tools made for Modern Standard Arabic
(MSA) fail on Arabic dialects.
COLABA enables MSA
users to interpret dialects correctly.
It helps find Arabic colloquial content that is currently not easily
searchable and accessible to MSA queries.
The COLABA team has
built a suite of tools that will offer users the ability to anonymously capture
online unstructured media content from blogs to comprehend, organize, and
validate content from informal and colloquial genres of online communication in
MSA and a variety of Arabic dialects
The DoD/Combating
Terrorism Technical Support Office/Technical Support Working Group (CTTSO/TSWG)
awarded the contract to Acxiom Corporation and partners from MTI/IBM, Columbia
University, Janya and Wichita State University to bring joint expertise to
address this challenge.
The suite has several use
applications:
· Support for language and
cultural learning by making colloquial Arabic intelligible to students of MSA
· Retrieval and
prioritization for triage and content analysis by finding Arabic colloquial and
dialect terms that today’s search engines miss; by providing appropriate interpretations
of colloquial Arabic, which is opaque to current analytics approaches; and by
Identify named entities, events, topics, and sentiment.
Enabling improved translations by MSA-trained MT systems
through decreases in out-of-vocabulary terms achieved by means of colloquial
term conversion to MSA.
Submitted by: National Air and Space Intelligence Agency
Presenter: Weimin Jiang
Topic: Pre-editing for Machine Translation
It is common practice that linguists will do MT
post-editing to improve translation accuracy and fluency. This presentation however, examines the
importance of pre-editing source material to improve MT. Even when a digital
source file which is literally correct is used for MT, there are still some
factors that have significant effect on MT translation accuracy and fluency.
Based on 35 examples from more than 20 professional
journals and websites, this article is about an experiment of pre-editing
source material for Chinese-English MT in the S&T domain. Pertinent
examples are selected to illustrate how machine translation accuracy and
fluency can be enhanced by pre-editing which includes the following four areas:
to provide a straightforward sentence structure, to improve punctuation, to use
straightforward wording, and to eliminate redundancy and superfluous elements.
Submitted by: Basis Technology Inc.
Speaker: Brian Roberson
Topic: Multi-Language Desktop Suite
Professional language analysts leverage a myriad of tools
in their quest to produce accurate translations of foreign language
material. The effectiveness of these tools ultimately affects
resource allocation, information dissemination and subsequent follow-on mission
planning – all three of which are vital, time-critical components in the
intelligence cycle.
This presentation will highlight the need for interactive
tools that perform jointly in an operational environment, focusing on a dynamic
suite of foreign language tools packaged into a desktop application and serving
in a machine translation role.
Basis Technology’s Arabic/Afghan Desktop Suite (ADS)
supports DOMEX, CELLEX, and HUMINT missions while being the most powerful
Arabic, Dari and Pushto text analytic and processing software available.
The ADS translates large scale lists of names from foreign language to
English and also pinpoints place names appearing in reports with their
coordinate locations on maps.
With standardization output having to be more accurate
than ever, the ADS ensures conformance with USG transliteration standards for Arabic
script languages, including IC, BGN/PCGN, SATTS and MELTS. The ADS
enables optimization of your limited resources and allows your analysts and
linguists to be tasked more efficiently throughout the workflow process.
Submitted by: CACI Inc. and Apptek
Presenter:
Kristen Summers and Hassan Sawaf
Topic:
User-generated System
for Critical Document Triage and Exploitation–Version 2011
CACI has developed and delivered systems for document
exploitation and processing to Government customers around the world. Many of these systems include advanced
language processing capabilities in order to enable rapid triage of vast
collections of foreign language documents, separating the content that requires
immediate human attention from the less immediately pressing material.
AppTek provides key patent-pending Machine Translation
technology for this critical process, rendering material in Arabic, Farsi and
other languages into an English rendition that enables both further automated
processing and rapid review by monolingual analysts, to identify the documents
that require immediate linguist attention.
Both CACI and AppTek have been working with customers to
develop capabilities that enable them, the users, to be the ones in command of
making their systems learn and continuously improve. We will describe how we put this critical
user requirement into the systems and the key role that the user’s involvement
played in this.
We will also discuss some of the key components of the
system and what the customer-centric evolution of the system will be, including
our document translation workflow, the machine translation technology within
it, and our approaches to supporting the technology and sustaining its success
designed around adapting to users’ needs.
Submitted by: AMTA Government Track
Organizers
Panel Moderator: Judith L.
Klavans
Topic: Task-based evaluation methods for machine
translation, in practice and theory
A panel of industry and government
experts will discuss ways in which they have applied task-based evaluation for
Machine Translation and other language technologies in their organizations and
share ideas for new methods that could be tried in the future. As part of
the discussion, the panelists will address some of the following points:
· What task-based evaluation means within their
organization, i.e., how task-based evaluation is defined
· How task-based evaluation impacts the use of
MT technologies in their work environment
· Whether task-based evaluation correlates with
MT developers’ automated metrics and if not, how do we arrive at automated metrics that do
correlate with the more expensive task-based evaluation
· What "lessons-learned" resulted from
the course of performing task-based evaluation
· How task-based evaluations can be generalized
to multiple workflow environments
Presenter: Rod Holland
Topic: Exploring the AFPAK Web
In spite of low
literacy levels in Afghanistan and the Tribal Areas of Pakistan, the Pashto and
Dari regions of the World Wide Web manifest diverse content from authors with a
broad range of viewpoints. We have used
cross-language information retrieval (CLIR) with machine translation to explore
this content, and present an informal study of the principal genres that we
have encountered. The suitability and
limitations of existing machine translation packages for these languages for
the exploitation of this content is discussed.
Submitted by: Raytheon
BBN Technologies
Presenter: Sean Colbath
Topic: Terminology Management for Web Monitoring
Current state-of-the-art in speech recognition, machine
translation, and natural language processing (NLP) technologies has allowed the
development of powerful media monitoring systems that provide today’s analysts
with automatic tools for ingesting and searching through different types of
data, such as broadcast video, web pages, documents, and scanned images.
However the core human-language technologies (HLT) in
these media monitoring systems are static learners, which mean that they learn
from a pool of labeled data and apply the induced knowledge to operational data
in the field. To enable successful and widespread deployment and adoption of
HLT, these technologies need to be able to adapt effectively to new operational
domains on demand.
To provide the US Government analyst with dynamic tools
that adapt to these changing domains, these HLT systems must support
customizable lexicons. However, the lexicon customization capability in HLT
systems presents another unique challenge especially in the context of multiple
users of typical media monitoring system installations in the field. Lexicon
customization requests from multiple users can be quite extensive, and may
conflict in orthographic representation (spelling, transliteration, or
stylistic consistency) or in overall meaning. To protect against spurious and
inconsistent updates to the system, the media monitoring systems need to
support a central terminology management capability to collect, manage, and execute
customization requests across multiple users of the system.
In this talk, we will describe the integration of a
user-driven lexicon/dictionary customization and terminology management
capability in the context of the Raytheon BBN Web Monitoring System (WMS) to
allow intelligence analysts to update the Machine Translation (MT) system in
the WMS with domain- and mission-specific source-to-English phrase translation
rules. The Language Learning Broker (LLB) tool from the Technology Development
Group (TDG) is a distributed system that supports dictionary/terminology
management, personalized dictionaries, and a workflow between linguists and
linguist management.
LLB is integrated with the WMS to provide a terminology
management capability for users to submit, review, validate, and manage
customizations of the MT system through the WMS User Interface (UI). We will
also describe an ongoing experiment to measure the effectiveness of this
user-driven customization capability, in terms of increased translation utility,
through a controlled experiment conducted with the help of intelligence
analysts.
Submitted by: Defense Intelligence Agency
Presenter: Nicholas Bemish
Topic: Use of HLT tools within the US Government
In today’s post 9/11 world, the need for qualified
linguists to process all the foreign language materials that are
collected/confiscated overseas and at home has grown considerably. To date, a
gap exists in the number of linguists needed to process all this material. To
fill this gap, the government has invested in the research, development and
implementation of Human Language Technologies into the linguist workflow.
Most of the current
DOMEX workflows incorporate HLT tools, whether that is Machine Translation,
Named Entity Extraction, Name Normalization or Transliteration tools. These
tools aid the linguists in processing and translating DOMEX material, cutting
back on the amount of time needed to sift through all the material.
In addition to the technologies used in workflow processes,
we have also implemented tools for intelligence analysts, such as the Broadcast
Monitoring System and Tripwire. These tools allow non-language qualified
analysts to search through foreign language material and exploit that material
for intelligence value. These tools implement such technologies as
Speech-to-text and machine translation.
Part of this effort to fill the gap in the ability to
process all this information has been collaboration amongst the members of the
Intelligence Community on the research and development of tools. This type of
engagement allows the government to save time and money in eliminating the
duplication of efforts and allows government agencies to share their ideas and
expertise.
Our presentation will address some of the tools that are
currently in use throughout DoD; being considered for use; some of the
challenges we face; and how we are making best use of the HLT development and
research that is supporting our needs.
Submitted by: National Research Council of Canada
Presenter: Alain Désilets
Topic: WeBiText: Multilingual Concordancer Built
from Public High Quality Web Content
In this paper, we describe WeBiText (www.webitext.ca) and how it is being
used. WeBiText is a concordancer that
allows translators to search in large, high-quality multilingual web sites, in
order to find solutions to translation problems. After a quick overview of the system, we
present results from an analysis of its logs, which provides a picture of how
the tool is being used and how well it performs. We show that it is mostly used to find
solutions for short, two or three word translation problems.
The system produces at least one hit for 58% of the
queries, and hits from at least five different web pages in 41% of cases. We show that 36% of the queries correspond to
specialized language problems, which is much higher than what was previously
reported for a similar concordancer based on the Canadian Hansard
(TransSearch). We also provide a back of
the envelope calculation of the current economic impact of the tool, which we
estimate at $1 million per year, and growing rapidly.
Presenter: Stacey Bailey
Topic: Data Preparation
for Machine Translation Customization
The presentation will focus on ongoing work to develop
sentence-aligned Chinese-English data for machine translation customization.
Fully automatic alignment produces noisy data (e.g., containing OCR and
alignment errors), and we are looking at the question of just how noisy noisy
data can be and still produce translation improvements. Related, data clean-up
efforts are time- and labor-intensive and we are examining whether translation
improvements justify the clean-up costs.
Submitted by: Northrop Grumman Corporation
Presenter: Michael Ladwig
Topic: Language NOW
Language Now is a natural language processing (NLP)
research and development
(R&D) program with a goal of improving the
performance of machine translation (MT) and other NLP technologies in mission-critical
applications. The Language NOW R&D program has produced the following four
primary advances as Government license-free technology:
-
A consistent and simple user interface
developed to allow non-technical users, regardless of language proficiency, to
use NLP technology in exploiting foreign language text content. Language
NOW research has produced first-of-a-kind capabilities such as detection and
handling of structured data, direct processing and visualization of foreign
language data with transliterations and translations
-
A highly efficient NLP integration
framework, the Abstract Scalable Language Services (ASLS). ASLS offers
system developers easy implementation of an efficient integrated service
oriented architecture suitable for devices ranging from handheld computers to
large enterprise computer clusters
-
Service wrappers integrating commercial,
Government license-free, open source and research software that provide NLP
services such as machine translation, named entity recognition, optical
character recognition (OCR), transliteration and text search
-
STatistical Engines for Language Analysis
(STELAE) and Maximum Entropy Extraction Pipeline (MEEP) tools that produce
customized statistical machine translation and hybrid statistical/rule-based
named entity recognition engines.
Submitted by: The Technology Development Group
Presenter:
Mike O’Malley
Topic:
The Challenges of Distributed Parallel Corpora
Parallel corpora have traditionally been created, maintained
and disseminated by translators and analysts addressing specific domains. They grow by aggregation, individual
contributions taking residence in the knowledge base. While the provenance of these new terms is
known, their validity is not; they must be vetted by domain and language
experts in order to be considered for use in the translation process. In order to address the evolving ecosphere
surrounding parallel corpora, developers and analysts need to move beyond the
data limitations of the static model.
This traditional model does not fully take advantage of
new infiltration and exfiltration datapaths available in today’s world of
distributed knowledge bases. Incoming
data are no longer simply textual- audio, imagery and video are all critical
components in corpora utility. Corpora
maintainers have access to these media types through a
variety of data sources, such as automated media monitoring services, the
output of any number of translation environments, and translation memory
exchanges (TMXs) developed by domain and language experts. These input opportunities are often
pre-vetted and ready for automated inclusion into the parallel corpora; their
content should not be reduced to the strictly textual. Unfortunately, the quality of the automated
alignment and segmentation systems used in these automated systems remains a
concern for the bulk preprocessing needed for downstream systems.
These data sources share a common characteristic, that of
known provenance. They are typically a
vetted source and a regular provider to the parallel corpora, whether via daily
newscasts or other means. Other data
sources are distributed in nature and thus offer distinct challenges to the
collection, vetting and exploitation processes.
One of the most exciting of such an infiltration path is
crowdsourcing. A next-generation
parallel corpora management system must be capable of, if not necessarily
automatically incorporating crowdsourced terminology as a vetted source,
facilitating manual inclusion of vetted crowdsourced terminology. This terminology may be submitted in any
scale from practically any source. It
may overlap or be contradictory- it almost certainly will require some degree
of analysis and evaluation before inclusion.
Fortunately, statistical analysis techniques are available to mitigate
these concerns. One significant benefit
to a crowdsourcing approach is the gains in alignment and segmentation accuracy
over similar products offered by the automated systems mentioned above. Given the scalability of crowdsourcing
methods, it is certainly a viable framework for bulk alignment and
segmentation.
Another consideration for the development of distributed
parallel corpora systems is their position in the translation workflow. The outputs and exfiltration paths of such a
system can be as used for such diverse purposes as addition to existing TMXs,
refinement of existing MT applications (through either improvement of their
learning processes or inclusion of parallel-corpora generated domain-specific
lexicons), creation of sentence pairs and other products for language learning
system (LLS) systems, and support for exemplar language clips such as those
developed by the State Department.
Submitted by: National Air and Space Agency
Presenter: William McIntyre
Topic: Translation of Chinese Entities in Russian
Text
This briefing addresses the development of a conversion
table that will enable a translator to render Chinese names, locations, and
nomenclature into proper Pinyin. As a
rule, Russian Machine Translation is a robust system that provides good
results. It is a mature system with
extensive glossaries and can be useful for translating documents across many
disciplines.
However, as a result of the transliteration process,
Russian MT will not convert Chinese terms from Russian into the Pinyin
standard. This standard is used by most
databases and the internet. Currently
the MT software is performing as it was designed, but this problem impacts the
accuracy of the MT making it almost useless for many purposes including data
retrieval.
Submitted by: National Virtual Translation Center
Presenter:
Carol Van Ess-Dykema and Laurie Gerber
Topic:
Parallel Corpus Development at NVTC
In this paper, we describe the methods used to develop an
exchangeable translation memory bank of sentence-aligned Mandarin Chinese -
English sentences. This effort is part of a larger effort, initiated by the
National Virtual Translation Center (NVTC), to foster collaboration and sharing
of translation memory banks across the Intelligence Community and the
Department of Defense.
In this paper, we describe our corpus creation process – a
largely automated process – highlighting the human interventions that are still
deemed necessary. We conclude with a brief discussion of how this work will
affect plans for NVTC’s new translation management workflow and future research
to increase the performance of the automated components of the corpus creation
process.