The 5th Workshop on
Building and Using
Comparable Corpora
Special Theme:
“Language Resources for
Machine Translation in
Less-Resourced Languages and Domains”
LREC2012 Workshop
26 May 2012
Table of Contents:
Reinhard Rapp, Marko Tadić, Serge Sharoff, Pierre
Zweigenbaum
Preface vii
Philipp Petrenz, Bonnie Webber
Robust Cross-Lingual
Genre Classification through Comparable Corpora 1
Qian Yu, François Yvon, Aurélien Max
Revisiting sentence
alignment algorithms for alignment visualization and evaluation 10
Inguna Skadiņa
Analysis and
Evaluation of Comparable Corpora for Under-Resourced Areas of Machine
Translation 17
Andrejs Vasiļjevs
LetsMT! – Platform
to Drive Development and Application of Statistical Machine Translation 20
Núria Bel, Vassilis Papavasiliou, Prokopis Prokopidis,
Antonio Toral, Victoria Arranz
Mining and Exploiting
Domain-Specific Corpora in the PANACEA Platform 24
Adam Kilgarriff, George Tambouratzis
Béatrice Daille
Building bilingual terminologies
from comparable corpora: The TTC TermSuite 29
Aimée Lahaussois, Séverine Guillaume
A viewing and
processing tool for the analysis of a comparable corpus of Kiranti mythology
33
Nancy Ide
MultiMASC: An Open
Linguistic Infrastructure for Language Research 42
Elena Irimia
Iustina Ilisei, Diana Inkpen, Gloria Corpas, Ruslan
Mitkov
Romanian Translational
Corpora: Building Comparable Corpora for Translation Studies 56
Angelina Ivanova
Evaluation of a Bilingual
Dictionary Extracted from Wikipedia 62
Quoc Hung-Ngo, Werner Winiwarter
A Visualizing
Annotation Tool for Semi-Automatical Building a Bilingual Corpus 67
Lene Offersgaard, Dorte Haltrup Hansen
SMT systems for
less-resourced languages based on domain-specific data 75
Towards a
Wikipedia-extracted Alpine Corpus 81
Sanja Štajner, Ruslan Mitkov
Dan Ştefănescu
Mining for Term
Translations in Comparable Corpora 98
George Tambouratzis, Michalis Troullinos, Sokratis
Sofianopoulos, Marina Vassiliou
Accurate phrase
alignment in a bilingual corpus for EBMT systems 104
Kateřina Veselovská, Ngăy Giang Linh, Michal
Novák
Using Czech-English
Parallel Corpora in Automatic Identification of It 112
Manuela Yapomo, Gloria Corpas, Ruslan Mitkov
CLIR- and
ontology-based approach for bilingual extraction of comparable documents 121
Amir Hazem, Emmanuel Morin
ICA for Bilingual
Lexicon Extraction from Comparable Corpora 126
Hiroyuki Kaji, Takashi Tsunakawa, Yoshihoro
Komatsubara
Improving Compositional
Translation with Comparable Corpora 134
Nikola Ljubešić, Špela Vintar, Darja Fišer
Multi-word term
extraction from comparable corpora by combining contextual and constituent
clues 143
Robert Remus, Mathias Bank