The 5th Workshop on

Building and Using Comparable Corpora

Special Theme: “Language Resources for

Machine Translation in Less-Resourced Languages and Domains”

LREC2012 Workshop

26 May 2012

Istanbul, Turkey

 

Table of Contents:

 

Reinhard Rapp, Marko Tadić, Serge Sharoff, Pierre Zweigenbaum

Preface vii

 

Philipp Petrenz, Bonnie Webber

Robust Cross-Lingual Genre Classification through Comparable Corpora 1

 

Qian Yu, François Yvon, Aurélien Max

Revisiting sentence alignment algorithms for alignment visualization and evaluation 10

 

Inguna Skadiņa

Analysis and Evaluation of Comparable Corpora for Under-Resourced Areas of Machine Translation 17

 

Andrejs Vasiļjevs

LetsMT! – Platform to Drive Development and Application of Statistical Machine Translation 20

 

Núria Bel, Vassilis Papavasiliou, Prokopis Prokopidis, Antonio Toral, Victoria Arranz

Mining and Exploiting Domain-Specific Corpora in the PANACEA Platform 24

 

Adam Kilgarriff, George Tambouratzis

The PRESEMT Project 27

 

Béatrice Daille

Building bilingual terminologies from comparable corpora: The TTC TermSuite 29

 

Aimée Lahaussois, Séverine Guillaume

A viewing and processing tool for the analysis of a comparable corpus of Kiranti mythology 33

 

Nancy Ide

MultiMASC: An Open Linguistic Infrastructure for Language Research 42

 

Elena Irimia

Experimenting with Extracting Lexical Dictionaries from Comparable Corpora for English-Romanian language pair 49

 

Iustina Ilisei, Diana Inkpen, Gloria Corpas, Ruslan Mitkov

Romanian Translational Corpora: Building Comparable Corpora for Translation Studies 56

 

Angelina Ivanova

Evaluation of a Bilingual Dictionary Extracted from Wikipedia 62

 

Quoc Hung-Ngo, Werner Winiwarter

A Visualizing Annotation Tool for Semi-Automatical Building a Bilingual Corpus 67

 

Lene Offersgaard, Dorte Haltrup Hansen

SMT systems for less-resourced languages based on domain-specific data 75

 

Magdalena Plamada, Martin Volk

Towards a Wikipedia-extracted Alpine Corpus 81

 

Sanja Štajner, Ruslan Mitkov

Using Comparable Corpora to Track Diachronic and Synchronic Changes in Lexical Density and Lexical Richness 88

 

Dan Ştefănescu

Mining for Term Translations in Comparable Corpora 98

 

George Tambouratzis, Michalis Troullinos, Sokratis Sofianopoulos, Marina Vassiliou

Accurate phrase alignment in a bilingual corpus for EBMT systems 104

 

Kateřina Veselovská, Ngăy Giang Linh, Michal Novák

Using Czech-English Parallel Corpora in Automatic Identification of It 112

 

Manuela Yapomo, Gloria Corpas, Ruslan Mitkov

CLIR- and ontology-based approach for bilingual extraction of comparable documents 121

 

Amir Hazem, Emmanuel Morin

ICA for Bilingual Lexicon Extraction from Comparable Corpora 126

 

Hiroyuki Kaji, Takashi Tsunakawa, Yoshihoro Komatsubara

Improving Compositional Translation with Comparable Corpora 134

 

Nikola Ljubešić, Špela Vintar, Darja Fišer

Multi-word term extraction from comparable corpora by combining contextual and constituent clues 143

 

Robert Remus, Mathias Bank

Textual Characteristics of Different-sized Corpora  148