ACL 2013


Sixth Workshop on Building and Using Comparable Corpora


Proceedings of the Workshop


August 8, 2013

Sofia, Bulgaria


Table of contents


Cross-lingual WSD for translation extraction from comparable corpora.

Marianna Apidianaki, Nikola Ljubešić, and Darja Fišer…1-10


Bilingual lexicon extraction via pivot language and word alignment tool.

Hong-Seok Kwon, Hyeong-Won Seo, and Jae-Hoon Kim… 11-15. [PDF, 666KB]


Using WordNet and semantic similarity for bilingual terminology mining from comparable corpora.

Dhouha Bouamor, Nasredine Semmar, and Pierre Zweigenbaum… 16-23


A comparison of smoothing techniques for bilingual lexicon extraction from comparable corpora.

Amir Hazem and Emmanuel Morin… 24-33


Chinese-Japanese parallel sentence extraction from quasi-comparable corpora.

Chenhui Chu, Toshiaki Nakazawa, and Sadao Kurohashi… 34-42


A modular open-source focused crawler for mining monolingual and bilingual corpora from the web. 

Vassilis Papavassiliou, Prokopis Prokopidis, and Gregor Thurmair … 43-51


Building basic vocabulary across 40 languages.

Judit Ács, Katalin Pajkossy, and András Kornai … 52-58


Scientific registers and disciplinary diversification: a comparable corpus approach.

Elke Teich, Stefania Degaetano-Ortlieb, Hannah Kermes, and Ekaterina Lapshinova-Koltunski … 59-68


Improving MT system using extracted parallel fragments of text from comparable corpora.

Rajdeep Gupta, Santanu Pal, and Sivaji Bandyopadhyay… 69-76


VARTRA: a comparable corpus for analysis of translation variation.

Ekaterina Lapshinova-Koltunski… 77-86


Building ontologies from collaborative knowledge bases to search and interpret multilingual corpora.

Yegin Genc, Elizabeth A.Lennon, Winter Mason, and Jeffrey V.Nickerson… 87-94


Using a random forest classifier to recognise translations of biomedical terms across languages.

Georgios Kontonatsios, Ioannis Korkontzelos, Jun’ichi Tsujii, and Sophia Ananiadou… 95-104


Comparing multilingual comparable articles based on opnions.

Motaz Saad, David Langlois, and Kamel Smaili… 105-111


Mining for domain-specific text from Wikipedia.

Magdalena Plamadă and Martin Volk… 112-120


Gathering and generating paraphrases from Twitter with application to normalization.

Wei Xu, Alan Ritter, and Ralph Grishman… 121-128


Learning comparable corpora from latent semantic analysis simplified document space.

Ekaterina Stambolieva… 129-137


Finding more bilingual webpages with high credibility via link analysis.

 Chengzhi Zhang, Xuchen Yao and Chunyu Kit … 138-143