Machine Translation

Summit XIII

 

Proceedings of the 13th Machine Translation Summit

 

Organized by:

Asia-Pacific Association for Machine Translation

Supported by:

Chinese Information Processing Society of China (CIPS)

Xiamen University (XMU)

 

September 19-23, 2011, Xiamen, China

 

Front cover

Table of Contents

 

Message from the President of International Association for Machine Translation:

Professor Hitoshi Isahara ................................................................................................... ii

 

Message from the Program Committee Chair:

Dr. Hiromi Nakaiwa.........................................................................................................  iii

 

Tutorials ........................................................................................................................... 12

 

Keynote: Professor Zhendong Dong ................................................................................ 18

 

Invited Talk 1: MT everywhere: Next Steps

Dr. Mike Dillinger ............................................................................................................ 20

 

Invited Talk 2: Strategic MT Research in Europe: Themes, Approaches, Results and Plans

Professor Hans Uszkoreit................................................................................................. 21

 

Special Session on Patent Translation

 

Introductory Talk: Challenges of Patent MT -- Term and Structure Translation

Professor Jun'ichi Tsujii.................................................................................................... 22

 

Invited Talk: MT for Patent Search at KIPO

Mr. YooChan Choi ........................................................................................................... 23

 

Invited Talk: COPPA, CLIR and TAPTA: three tools to assist in overcoming the Patent language barrier at WIPO

Mr. Bruno Pouliquen......................................................................................................... 24

 

A1 Research Papers – Training (1)

A1-1 Methods for Smoothing the Optimizer Instability in SMT

Mauro Cettolo, Nicola Bertoldi and Marcello Federico ................................................... 32

A1-2 Training Machine Translation with a Second-Order Taylor Approximation of Weighted Translation Instances

Aaron Phillips and Ralf Brown.......................................................................................... 40

A1-3 Maximum Rank Correlation Training for Statistical Machine Translation

Daqi Zheng, Yifan He, Yang Liu and Qun Liu.................................................................. 48

 

B1 Research Papers – Pre-processing for MT

B1-1 POS Tagging of English Particles for Machine Translation

Jianjun Ma, Degen Huang, Haixia Liu and Wenfeng Sheng ............................................. 57

B1-2 Multi-stage Chinese Dependency Parsing Based on Dependency Direction

Wenjing Lang, Qiaoli Zhou, Guiping Zhang and Dongfeng Cai....................................... 64

B1-3 Statistic Machine Translation Boosted with Spurious Word Deletion

Shujie Liu, Chi-Ho Li and Ming Zhou .............................................................................. 72

 

C1 Research Papers – Speech Translation

C1-1 Phonetic Representation-Based Speech Translation

Jie Jiang, Zeeshan Ahmed, Julie Carson-Berndsen, Peter Cahill and Andy Way............. 81

C1-2 Unsupervised Vocabulary Selection for Domain-Independent Simultaneous Lecture

Translation

Paul Maergner, Ian Lane and Alex Waibel........................................................................ 89

C1-3 Context-aware Language Modeling for Conversational Speech Translation

Avneesh Saluja, Ian Lane and Ying Zhang........................................................................ 97

 

A2 Research Papers – Training (2)

A2-1 Incremental Training and Intentional Over-fitting of Word Alignment

Qin Gao, Will Lewis, Chris Quirk and Mei-Yuh Hwang ............................................... 106

A2-2 Alignment Inference and Bayesian Adaptation for Machine Translation

Kevin Duh, Katsuhito Sudoh, Tomoharu Iwata and Hajime Tsukada............................ 114

A2-3 Multi-Strategy Approaches to Active Learning for Statistical Machine Translation

Vamshi Ambati, Stephan Vogel and Jaime Carbonell ................................................... 122

 

B2 Research Papers – Technologies Supporting MT

B2-1 Document-level Consistency Verification in Machine Translation

Tong Xiao, Jingbo Zhu, Shujie Yao and Hao Zhang ..................................................... 131

B2-2 Function Word Generation in Statistical Machine Translation Systems

Lei Cui, Dongdong Zhang, Mu Li and Ming Zhou ....................................................... 139

B2-3 Multimodal Building of Monolingual Dictionaries for Machine Translation by Non-Expert Users

Miquel Esplà-Gomis, Víctor M. Sánchez-Cartagena and Juan Antonio Pérez-Ortiz.... 147

 

C2 Research Papers – Computer Assisted Translation

C2-1 Automatic Post-Editing based on SMT and its selective application by Sentence-Level Automatic Quality Evaluation

Hirokazu Suzuki............................................................................................................. 156

C2-2 Qualitative Analysis of Post-Editing for High Quality Machine Translation

Frédéric Blain, Jean Senellart, Holger Schwenk, Mirko Plitt and Johann Roturier...... 164

C2-3 Using machine translation in computer-aided translation to suggest the target-side

words to change

Miquel Esplà-Gomis, Felipe Sánchez-Martínez and Mikel L. Forcada ....................... 172

 

A3 Research Papers – Model (1)

A3-1 A Unified SMT Framework Combining MIRA and MERT

Shujie Liu, Chi-Ho Li and Ming Zhou ......................................................................... 181

A3-2 Improving Phrase Extraction via MBR Phrase Scoring and Pruning

Nan Duan, Mu Li, Ming Zhou and Lei Cui .................................................................. 189

A3-3 Phrase Segmentation Model using Collocation and Translational Entropy

Hyoung-Gyu Lee, Joo-Young Lee, Min-Jeong Kim, Hae-Chang Rim, Joong-Hwi

Shin and Young-Sook Hwang ...................................................................................... 198

 

B3 Research Papers – MT Based on Linguistic Knowledge

B3-1 Singular or Plural? Exploiting Parallel Corpora for Chinese Number Prediction

Elizabeth Baran and Nianwen Xue................................................................................ 207

B3-2 Handling Multiword Expressions in Phrase-Based Statistical Machine Translation

Santanu Pal, Tanmoy Chakraborty and Sivaji Bandyopadhyay ................................... 215

VI

B3-3 Automatic Error Analysis for Morphologically Rich Languages

Ahmed El Kholy and Nizar Habash .............................................................................. 225

 

C3 User’s Studies (1)

C3-1 MT use within the enterprise: Encouraging adoption via a unified MT API

Raymond Flournoy ....................................................................................................... 234

C3-2 Deploying MT into a Localisation Workflow: Pains and Gains

Yanli Sun, Juan Liu and Yi Li....................................................................................... 239

C3-3 Evaluation of MT Systems to Translate User Generated Content

Johann Roturier and Anthony Bensadoun..................................................................... 244

 

A4 Research Papers – Model (2)

A4-1 A Unified and Discriminative Soft Syntactic Constraint Model for Hierarchical

Phrase-based Translation

Lemao Liu, Tiejun Zhao, Chao Wang and Hailong Cao ............................................... 253

A4-2 Simple but Effective Approaches to Improving Tree-to-tree Model

Feifei Zhai, Jiajun Zhang, Yu Zhou and Chengqing Zong............................................ 261

A4-3 Unpacking and Transforming Feature Functions: New Ways to Smooth Phrase Tables

Boxing Chen, Roland Kuhn, George Foster and Howard Johnson ............................... 269

 

B4 Research Papers – Domain Adaptation

B4-1 Identification and Translation of Significant Patterns for Cross-Domain SMT Applications

Han-Bin Chen, Hen-Hsen Huang, Jengwei Tjiu, Ching-Ting Tan and Hsin-Hsi Chen .. 277

B4-2 Domain Adaptation in Statistical Machine Translation of User-Forum Data using

Component Level Mixture Modelling

Pratyush Banerjee, Sudip Kumar Naskar, Johann Roturier, Andy Way and Josef van

Genabith ........................................................................................................................... 285

B4-3 Bagging-based System Combination for Domain Adaption

Linfeng Song, Haitao Mi, Yajuan and Qun Liu.......................................................... 293

 

C4 Research Papers – Multi-path Translation

C4-1 Extracting Pre-ordering Rules from Chunk-based Dependency Trees for

Japanese-to-English Translation

Xianchao Wu, Katsuhito Sudoh, Kevin Duh, Hajime Tsukada and Masaaki Nagata..... 300

C4-2 Statistical Post-Editing for a Statistical MT System

Hanna Bechara, Yanjun Ma and Josef van Genabith ...................................................... 308

C4-3 Post-ordering in Statistical Machine Translation

Katsuhito Sudoh, Xianchao Wu, Kevin Duh, Hajime Tsukada and Masaaki Nagata...... 316

 

P1A Research Papers

P1A-1 Searching Translation Memories for Paraphrases

Masao Utiyama, Graham Neubig, Takashi Onishi and Eiichiro Sumita ......................... 325

P1A-2 Are numbers good enough for you? - A linguistically meaningful MT evaluation method

Takako Aikawa and Spencer Rarrick............................................................................... 332

 

P1A-3 Marker-based Chunking for Analogy-based Translation of Chunks

Kota Takeya and Yves Lepage......................................................................................... 338

P1A-4 A Comparison of Unsupervised Bilingual Term Extraction Methods Using Phrase-Tables

Masamichi Ideue, Kazuhide Yamamoto, Masao Utiyama and Eiichiro Sumita ............. 346

P1A-5 Improving Low-Resource Statistical Machine Translation with a Novel Semantic Word Clustering Algorithm

Jeff Ma, Spyros Matsoukas and Richard Schwartz .......................................................... 352

P1A-6 Multi-granularity Word Alignment and Decoding for Agglutinative Language Translation

Zhiyang Wang, Yajuan and Qun Liu ........................................................................... 360

 

P1C System Presentations

P1C-1 Word Alignment Using GIZA++ on Windows

Liang Tian, Fai Wong and Sam Chao................................................................................ 369

P1C-2 ENGtube: an Integrated Subtitle Environment for ESL

Chi-Ho Li, Shujie Liu, Chenguang Wang and Ming Zhou................................................ 373

P1C-3 Broadcast news speech-to-text translation experiments

Sylvain Raybaud, David Langlois and Kamel Smaïli........................................................ 378

 

A5 Research Papers – Model (3)

A5-1 Improving the Hierarchical Phrase-Based Translation Model

Xiaodong Shi, Xiang Zhu and Yidong Chen .................................................................... 383

A5-2 Lexical-based Reordering Model for Hierarchical Phrase-based Machine Translation

Zhongguang Zheng, Yao Meng and Hao Yu..................................................................... 389

A5-3 Effective Use of Discontinuous Phrases for Hierarchical Phrase-based Translation

Wei Wei and Bo Xu........................................................................................................... 397

 

B5 Research Papers – Corpora

B5-1 Generating Virtual Parallel Corpus: A Compatibility Centric Method

Jia Xu and Weiwei Sun ..................................................................................................... 406

B5-2 Parallel Corpus Refinement as an Outlier Detection Algorithm

Kaveh Taghipour, Shahram Khadivi and Jia Xu ............................................................... 414

B5-3 MT Detection in Web-Scraped Parallel Corpora

Spencer Rarrick, Chris Quirk and Will Lewis .................................................................. 422

 

C5 Research Papers – Grammatical Theory for MT

C5-1 On the Expressivity of Linear Transductions

Markus Saers, Dekai Wu and Chris Quirk ........................................................................ 431

C5-2 Handheld Machine Translation System Based on Constraint Synchronous Grammar

Fai Wong, Francisco Oliveira, Sam Chao and Chi-Wai Tang .......................................... 439

 

P2A Research Papers

P2A-1 A Comparison Study of Parsers for Patent Machine Translation

Isao Goto, Masao Utiyama, Takashi Onishi and Eiichiro Sumita .................................... 448

P2A-2 Rich Linguistic Features for Translation Memory-Inspired Consistent Translation

Yifan He, Yanjun Ma, Andy Way and Josef van Genabith .............................................. 456

P2A-3 Japanese-Chinese Phrase Alignment Using Common Chinese Characters Information

Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi ................................................. 464

P2A-4 The Cultivation of a Chinese-English-Japanese Trilingual Parallel Corpus from

Comparable Patents

Bin Lu, Ka Po Chow and Benjamin K. Tsou .................................................................... 472

P2A-5 Evaluation Methodology and Results for English-to-Arabic MT

Olivier Hamon and Khalid Choukri .................................................................................. 480

P2A-6 Example-Based Machine Translation for Low-Resource Language Using Chunk-String Templates

Khan Md. Anwarus Salam, Setsuo Yamada and Tetsuro Nishino ................................... 488

P2A-7 Improve SMT with Source-Side “Topic-Document” Distributions

Zhengxian Gong, Guodong Zhou and Liangyou Li .......................................................... 496

 

P2C System Presentations

P2C-1 AIR-based light clients for supporting Moses engine training

Jeffrey Rueppel, Li Jiang, Gong Yu and Ray Flournoy .................................................... 503

P2C-2 LetsMT!: Cloud-Based Platform for Building User Tailored Machine Translation Engines

Andrejs Vasiljevs, Raivis Skadinš and Jörg Tiedemann ................................................... 507

 

A6 Research Papers – Evaluation

A6-1 Predicting Machine Translation Adequacy

Lucia Specia, Najeh Hajlaoui, Catalina Hallett and Wilker Aziz...................................... 513

A6-2 Getting Expert Quality from the Crowd for Machine Translation Evaluation

Luisa Bentivogli, Marcello Federico, Giovanni Moretti and Michael Paul....................... 521

A6-3 A Framework for Diagnostic Evaluation of MT Based on Linguistic Checkpoints

Sudip Kumar Naskar, Antonio Toral, Federico Gaspari and Andy Way........................... 529

A6-4 Comparative Evaluation of Term Informativeness Measures in Machine Translation

Evaluation Metrics

Billy Wong and Chunyu Kit............................................................................................... 537

 

B6 Research Papers – System Combination

B6-1 System Combination for Machine Translation Based on Text-to-Text Generation

Wei-Yun Ma and Kathleen Mckeown............................................................................... 546

B6-2 Hybrid Machine Translation Guided by a Rule–Based System

Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez and

Kepa Sarasola.................................................................................................................... 554

B6-3 Integrating shallow-transfer rules into phrase-based statistical machine translation

Víctor M. Sánchez-Cartagena, Felipe Sánchez-Martínez and Juan Antonio Pérez-Ortiz .562

B6-4 Hypergraph Training and Decoding of System Combination in SMT

Yupeng Liu, Tiejun Zhao and Sheng Li............................................................................. 570

 

C6 User’s Studies (2)

C6-1 Study on the Impact Factors of the Translators' Post-editing Efficiency in a Collaborative Translation Environment

Na Ye and Guiping Zhang................................................................................................ 579

C6-2 UTX 1.11, a Simple and Open User Dictionary/Terminology Standard, and its

Effectiveness with Multiple MT Systems

Seiji Okura, Yuji Yamamoto, Hajime Ito, Michael Kato, Miwako Shimazu and

Francis Bond..................................................................................................................... 587

C6-3 Real-time Multi-media Translation for Healthcare: a Usability Study

Mark Seligman and Mike Dillinger................................................................................... 595

 

4th Workshop on Patent Translation