ACL 2013

 

Sixth Workshop on Building and Using Comparable Corpora

 

Proceedings of the Workshop

 

August 8, 2013

Sofia, Bulgaria

 

Table of contents

 

Cross-lingual WSD for translation extraction from comparable corpora.

Marianna Apidianaki, Nikola Ljubešić, and Darja Fišer…1-10

 

Bilingual lexicon extraction via pivot language and word alignment tool.

Hong-Seok Kwon, Hyeong-Won Seo, and Jae-Hoon Kim… 11-15. [PDF, 666KB]

 

Using WordNet and semantic similarity for bilingual terminology mining from comparable corpora.

Dhouha Bouamor, Nasredine Semmar, and Pierre Zweigenbaum… 16-23

 

A comparison of smoothing techniques for bilingual lexicon extraction from comparable corpora.

Amir Hazem and Emmanuel Morin… 24-33

 

Chinese-Japanese parallel sentence extraction from quasi-comparable corpora.

Chenhui Chu, Toshiaki Nakazawa, and Sadao Kurohashi… 34-42

 

A modular open-source focused crawler for mining monolingual and bilingual corpora from the web. 

Vassilis Papavassiliou, Prokopis Prokopidis, and Gregor Thurmair … 43-51

 

Building basic vocabulary across 40 languages.

Judit Ács, Katalin Pajkossy, and András Kornai … 52-58

 

Scientific registers and disciplinary diversification: a comparable corpus approach.

Elke Teich, Stefania Degaetano-Ortlieb, Hannah Kermes, and Ekaterina Lapshinova-Koltunski … 59-68

 

Improving MT system using extracted parallel fragments of text from comparable corpora.

Rajdeep Gupta, Santanu Pal, and Sivaji Bandyopadhyay… 69-76

 

VARTRA: a comparable corpus for analysis of translation variation.

Ekaterina Lapshinova-Koltunski… 77-86

 

Building ontologies from collaborative knowledge bases to search and interpret multilingual corpora.

Yegin Genc, Elizabeth A.Lennon, Winter Mason, and Jeffrey V.Nickerson… 87-94

 

Using a random forest classifier to recognise translations of biomedical terms across languages.

Georgios Kontonatsios, Ioannis Korkontzelos, Jun’ichi Tsujii, and Sophia Ananiadou… 95-104

 

Comparing multilingual comparable articles based on opnions.

Motaz Saad, David Langlois, and Kamel Smaili… 105-111

 

Mining for domain-specific text from Wikipedia.

Magdalena Plamadă and Martin Volk… 112-120

 

Gathering and generating paraphrases from Twitter with application to normalization.

Wei Xu, Alan Ritter, and Ralph Grishman… 121-128

 

Learning comparable corpora from latent semantic analysis simplified document space.

Ekaterina Stambolieva… 129-137

 

Finding more bilingual webpages with high credibility via link analysis.

 Chengzhi Zhang, Xuchen Yao and Chunyu Kit … 138-143