[Extracted from: www.meta-net.eu/events/meta-forum-2012/index.html]
META-FORUM 2012
19-21 June 2012,
More than 200
participants from research, various industries and politics. 57 speaker
contributions. 12 award winners. Two days of intense discussions about the
current state and future of language technology in
META-FORUM 2012
was organised by META-NET, a Network of
Excellence consisting of 60 research centres from 34 countries. META-NET is
dedicated to building the technological foundations of a multilingual European
information society. META-NET is forging META,
the Multilingual Europe Technology Alliance, with currently more than 640
members.
This year's
META-FORUM was the third edition, after two successful events in Brussels 2010 and Budapest 2011. The
timing of META-FORUM 2012 was ideal: currently, there is a lot of discussion
about topics in upcoming, long-term research programmes. The META-FORUM event,
the Language White Paper series
and the Strategic Research Agenda
(SRA) developed within META-NET, all aim at presenting a very strong message
from the language technology community.
In what follows,
after a summary of the key messages, you will find short descriptions of all
presentations. Links are provided to videos of the presentations and to the
slides used. We recommend also watching the videos: they are short and contain
much more detail.
What follows is
an analysis and synthesis of ideas brought out during META-FORUM 2012. It is very
high level, and you should watch the presentations to get a better
understanding of the points made.
Thibaut Kleiner
opened the event with a description of the challenges the language technology
community in
The Europe and
its languages session discussed the current situation for
The Industry and
Innovation session was opened by Serge
Gladkoff. He presented the viewpoint of GALA on the efforts of META-NET and the
needs of language service providers. Tomás
Pariente brought up the topic of (big) open data. Language technology can
support the creation of high quality big data and provide useful means for its
consumption. Radu Soricut presented a
view on machine translation from an industry perspective: machine translation
needs to be part of many industrial ecosystems and big data sources like the
Web. George Wright introduced the BBC
World Service archive and how speech technology is used to foster access to
thousands of radio programmes. Florence
Beaujard showed how Boeing is using controlled language to assure high
quality in the life critical area of cockpit design. Lori Thicke discussed the future of
machine translation. The connection with other technologies like controlled
languages and the integration into industrial workflows can help to achieve
better quality and to bootstrap the creation of new machine translation
systems. A short discussion touched, among others, on the availability of
language resources and issues that need to be resolved with regards to
licensing.
Language
Technologies for Europe 2020 was a joint session organised by META-NET
and LT Innovate. Georg Rehm
introduced the approach of META-NET towards the topic, followed and
complemented by Jochen Hummel and
LT-Innovate. Hans Uszkoreit summarized
the META-NET Strategic Research Agenda (SRA). The SRA encompasses three
priority themes: “Translation Cloud”, presented by Andrejs Vasiljevs, “Social
Intelligence and eParticipation”, presented by Marko Grobelnik, and “Socially Aware
Interactive Assistants”, presented by Joseph
Mariani. Seven key participants from LT-Innovate reported on the current
state of the LT-Innovate innovation themes. These themes are an important input
for framing how “innovation” is described within the SRA. The final joint
discussion focused on the needs of end users for language technology.
The first day
closed with a fireworks display of twelve winners: eleven META Seals of Recognition were
given to products in various areas of language technology: information
extraction (Jakub Zavrel); speech processing (Alessandro Tescari, Siegfried
Kunzmann, Radim Kudla and Joseph Mariani); machine translation (Heidi
Depraetere, Bernardo Magnini, Radu Soricut, Kirti Vashee and Dion Wiggins and
Tony O'Dowd); and basic tools for natural language processing (Lauri
Karttunen). The META Prize was awarded to the JRC Optima Activity for
Language Technology, represented by Erik van der Goot. JRC develops the Europe
Media Monitor. EMM is a language technology enabled service that gathers
information from news portals in 60 languages and classifies the articles,
analyses news texts, issues alerts and produces visual presentations of the
news.
The second day
started with an overview of short and long term opportunities for LT Research
and Innovation on the European Level. Kimmo
Rossi introduced two new programmes that are currently being planned:
Horizon 2020 and Connecting Europe Facility (CEF). Roberto Cencioni provided details about
topics in current calls for project proposals.
The topic of
exchanging and re-using language resources was the primary focus of the
META-SHARE session. META-SHARE is an open infrastructure to foster LR/LT
sharing and re-use. Stelios Piperidis
described general aspects of META-SHARE, related, e.g., to licensing aspects.
The following three presenters focused on contributions to META-SHARE: Tamás Varadi mostly for Slavic languages,
Antonio Branco for south European
languages, and Andrejs Vasiljevs
for north European languages, with a specific focus on smaller languages. In
the final discussion it became clear that, given this input, META-SHARE is
already populated with many resources that are useful for research and
industry. Now the sustainability of META-SHARE needs to be assured, and the
availability of resources with adequate licenses.
The final panel
session on LT research in EU member states and regions complemented the
previous session on support on the European level. The situation was explained
for the countries Hungary (Károly Gruber), Bulgaria (Diana Popova), Czech Republic (Karel Oliva), France (Edouard Geoffrois), the Netherlands (Alice Dijkstra), and Slovenia (Simona Bergoč). Joseph Mariani explained various
political instruments to support joint research between member states. The
panel discussion focused on benefits of coordinated programmes on the European
level and methods to create these.
The keynote
lecture of META-FORUM 2012 was given by Fernando Pereira. He talked about
language technology efforts at Google. Here the focus is on language technology
workflows that scale to the Web and inter-relate external knowledge bases with
Web content. The good news is that language technology has its role in this
workflow and achieves better results than the simple matching of patterns in
texts; however, to be able to compete in such industrial scenarios, language
technology must be robust and scalable.
Hans Uszkoreit
gave a short closing presentation, coming back to the challenges for the LT
community mentioned by Thibaut Kleiner. META-FORUM 2012 presented impressive
results that European language technology has achieved so far, and issues that
need to be addressed in the coming months. The community is in good shape to
deal with these challenges and will present the outcomes at META-FORUM 2013.
Thibaut
Kleiner, Member of the Cabinet of Neelie Kroes, Commissioner for the Digital
Agenda and Vice-President of the European Commission, EC, opened the conference
with a presentation entitled “Technological Challenges of the Multilingual
European Society”. He gave a warning message to the language technology
community: future funding for language technology in
Language
technology can help to master the fast amount of information on the Web in
various languages with applications like news and opinion mining or business
intelligence. Such application areas for language technology are obvious, the
question is mostly who will take the lead –
Other
communities, e.g. around open data, have managed to get interest from policy
makers and lead to output in many SMEs – language technology too needs to have
a strong voice on the policy maker level.
Hans
Uszkoreit, from DFKI (
The three major
challenges for language technology are to preserve multilingual diversity, to
secure cross-lingual flow of information, and to give means for communication,
information and knowledge management to all language communities. In META-NET,
the European language technology community has worked for more than two years
in three lines of action to address these challenges:
META-FORUM 2012
presents various outcomes and the current state of this work, like the language
white paper series, the META-VISION process involving about 100 experts and
leading to a draft SRA, and the META-SHARE repositories covering more than 1300
language resources.
Building a
bridge to the opening keynote from Thibaut Kleiner, Hans Uszkoreit reminded us
that the decision about future funding for language technology in
Algirdas
Saudargas, Member of the European Parliament from
It is important
to translate this “message” to the political language, so that it will be take
up outside the language technology community. Only in this way we will be able
to convince policy makers. In this conversation, language technology should not
be described as science fiction that puts machines between humans, but as a
means to support communication between humans.
András
Kornai from the
The presentation
took a look at Wikipedia. An overall analysis shows that only a small
percentage of languages are in the comfort zone. Many languages are vital in
terms of speakers, but not represented well in the digital world. We need
enabler projects for building basic tools for these languages and also for
“heritage languages”, so that they can achieve a passive Web presence of
lexicons, classical literature etc.
Bolette
Sandford Pedersen,
A specific
effort was put into cross-lingual ranking of language technology support, with
broad categories like “excellent”, “good”, “moderate”, “fragmentary”, and “weak
or no support”. The results vary with regards to the technology area. For
example, support for voice technologies is slightly better than machine
translation. Nevertheless “excellent” support is not given for any language,
and even the level “good” is rarely available for languages other than English.
The language white papers have detected major gaps in terms of language
technology support for each European language that need to be addressed in the
near future, to assure competiveness of the languages in the digital market.
Panel with Representatives of the EFNIL Language Communities
Gerhard Stickel, Institute for the German
Language and EFNIL (European Federation of National Institutions for Language)
president, opened the panel. He introduced EFNIL as an organization, its
history, the EFNIL conference series and collaborations with META-NET. META-NET
can contribute new ICT related developments to EFNIL; EFNIL provides needs and
use cases for language technology from the perspectives of language research,
planning and teaching. Progress in language technology still has to be made, but
there are more and more applications of language technology in EFNIL related
areas. In the future this might lead to more exchange of knowledge between the
fields, but maybe also to concrete, joint projects.
Arnfinn Muruvik Vonen , from the Language
Council of Norway, acknowledged that local politicians in
Ray Fabri, National Council for the Maltese Language,
described the complex bilingual situation in
Peter Spyns, De Nederlandse Taalunie,
referred back to the Wikipedia analysis made by András Kornai, saying that
Dutch has a good position in the digital age. In
Arvi Tavast, Institute of the Estonian
Language, congratulated the authors of the white paper series to their results.
He added a small warning with regards to the message made for smaller
languages: politicians look into (economic) outcomes of language technology
research. To make these visible, projects are necessary that put the languages
into the “comfort zone”, in the sense of András Kornai. For these projects and
in general, a copyright law is needed that eases the re-use of language
resources.
Algirdas Saudargas, Member of the European
Parliament from
During the panel discussion, potential additional input to
the analysis of language technology support was discussed. This includes e.g.
insights from professional translators. Additional languages like sign
languages have to be taken into account. From an end user perspective, a need
like “we want good quality speech recognition” was articulated. We need to make
clear also to policy makers that such a request involves the development of the
underlying language technology “food chain”, including lots of components for
e.g. morphological and phonological analysis.
Relating the
Wikipedia analysis from András Kornai to the panel, most official European
languages were categorized as being “vital” in the digital world. However, to
achieve wide uptake of language technology, the technology also has to be
marketed in the right manner, e.g. via “cool, easy to understand Apps”.
Finally, it was emphasized that a European copyright law that eases re-use of
language resources for research purposes is deeply needed.
Serge
Gladkoff, GALA Standards Director and GALA Board member, President of Logrus
International Cooperation, opened the “Industry and Innovation – Language
Technology made in
GALA is the
umbrella for the language service providers and users across
Tomás
Pariente, Atos Research & Innovation,
In these and
other areas, more and more unstructured data has to be processed. Atos is
involved in projects that deal with such information in specific domains, e.g.
finance. This is, however, only one area of big data: in the BIG project, the
aim is to analyse big data from various perspectives, e.g. technology, business
and policy, and in many different domains: health, public sector, finance &
insurance, media & entertainment, and manufacturing, retail, energy, transport.
BIG’s main idea is to gather the relevant stakeholders; the language technology
community now has the opportunity to be recognized as part of them and to
contribute ideas and methodologies for handling big data.
Radu
Soricut, manager of application science & engineering and senior research
scientist, SDL International, gave a talk entitled “Changing the Perspective on Machine
Translation”. In the past, the MT community was concerned with the MT
technology itself, e.g. approaches towards MT (e.g. statistical MT vs. rules
based MT), integration e.g. with translation memories etc. However, the end
customer mostly cares about the value in a customer specific eco system. Hence,
MT needs to be part of many infrastructures including e.g. CMS or ERP systems,
and needs to be able to make use of the large data source “the Web”.
With MT in large
industrial ecosystems, exploitation of massive parallel data available in
translation memories and connected to the Web becomes possible. MT systems
easily can be tailored to customer relevant domains. MT engines can take
various information sources like user feedback into account. Challenges for the
future of industry strength MT include scalability of customization, adaptation
to customers and automatic learning from user feedback.
George
Wright, head of the Internet Research & Future Services Team, BBC Research,
gave a presentation entitled “Speech
analysis and Archive research at the BBC”. The background is the BBC World
Service archive which covers 70.000 radio programmes, but has only sparse
metadata available for accessing it. Language technology can help to
(re-)categorize the content and create links between content items and the Web.
As a result, the
system e.g. builds suggestions about topics covered in a programme or
identifies separate speakers. The accuracy of the results is still a challenge.
One issue is the availability of adequate language resources, e.g. tools that
can handle British English adequately. The development of these and other
language resources must be supported, so that the fast volumes of multimedia
content will be accessible for a global audience.
Florence
Beaujard, head of Linguistics and Physiology Group, Airbus, gave a talk entitled “Linguistic Activities of Airbus Design
Office”. In cockpit design, the special purpose language of pilots and many
other constraints like size of displays have to be taken into account to create
clear messages and labels. This is why Boeing has defined a controlled
language: it helps to reduce potential ambiguities, and to improve text
comprehensibility by non-native English speakers.
There are some
general principles like “one word, one meaning” or “one meaning, one word
order” underlying the controlled language. In addition, there are lexica and
rules how to write messages or labels. Collaboration with pilots and instructors
is crucial for the development of the controlled language. Outcomes so far are
various tools e.g. to extract display text from the designers specification,
and to automatically check its adequacy as a message or label. A desire for the
future is to ease specification writing for system designers, via dedicated
controlled language(s) to guide the designers.
Lori
Thicke, CEO, Lexcelera Localization and representative of Translators without
Borders, talked about “Why Do We Need
Language Technology”. Language technology is needed to deal with a
contradictory situation: more and more content has to be translated faster,
with demand for more quality, and on lower costs. Translation also plays a
societal role: e.g. access to translated information in developing countries
can be critical even for survival. Language technology like machine translation
also can help to resolve the mismatch of digital content available and number
of speakers in the developing countries.
For the future
of machine translation, it is important to see the technology as a process,
including pre-production, the actual processing, post-editing etc. Quality in
the source content is key to deliver quality MT. The ACCEPT project is
dedicated to develop controlled language rules, which will help management of
content in social forums and finally better quality machine translation. Work
areas for the future of MT include post-editing, terminology control and the
integration of MT with translation memories.
Discussion
The discussion
touched on the re-occurring issue of copyright and language resources. Both the
corpus created in the “Translators without Borders” project and the archives
created by the BBC are valuable resources for research purposes. But they can
only be re-used if the thin, but important line between distributing resources
freely and making them available for research is drawn.
Controlled
language was discussed also in terms of re-use. The presentation from Florence
Beaujard demonstrated that a concrete controlled language is quite specific to
application scenarios. Nevertheless there is the opportunity to re-use
controlled language resources, e.g. criteria to reduce synonymous, rules to
create acronyms or to generate abbreviations etc. This could be achieved via
creating a standardized specification for some aspects of controlled language.
Machine
translation is facing questions like what metrics to use for its evaluation. It
was proposed that the same metrics should be used like in human translation,
e.g. the LISA quality metrics. An issue that has no general solution is machine
translation for languages with a limit amount of language resources. There is
no silver bullet to solve this problem; at the end, human translators need to
create the resources.
META-NET and
LT-Innovate started this session with a joint slot.
Georg
Rehm, META-NET and DFKI, gave a presentation entitled “Introduction and Presentation of Partnership”.
After a short history of META-NET, the focus was on the META- VISION line of
action for “building a community with a shared vision and strategic research
agenda”: as of mid 2012, META-NET has 60 members in 34 countries. Collaboration
agreements have been created with 46 other EU-funded projects.
META-NET has
established
Jochen
Hummel, ESTeam and chairman of LT-Innovate, gave an “Introduction to LT Innovate”. LT-Innovate
aims at promoting European language technology, unifying the industrial
community, and to articulate itself towards investors and policy makers.
Language
Technology is the missing piece in the puzzle of the digital single market.
LT-Innovate is creating an innovation agenda to fill this gap. This agenda
complements the META-NET strategic research agenda (SRA), with the aim to
foster adoption of research results in the market. About 150 people from the
language technology industry participated in the LT-Innovate summit that took
place just before META-FORUM. They discussed the “innovation agenda”, showcased
their language technology applications, and demonstrated a strong voice of the
European LT industry.
Hans
Uszkoreit, DFKI and META-NET, gave a presentation entitled “The META-NET Strategic Research Agenda:
Overview, Preparation, Dissemination”. Creating the Strategic Research
Agenda (SRA) is one main task of META-NET. In the SRA, on the basis of the
state of IT technology, a broad vision for the year 2020 and various strategic
considerations, three interconnected priority themes have been developed. These
will be accompanied by an innovation model, to be developed in close
collaboration with LT-Innovate.
Various new
topics will influence the SRA: big data, services & cloud computing, and
shared infrastructures. Language technologies are prime candidates for “sky
computing”, a new area that encompasses the federation of several clouds for
creating complex services. A sky computing based, European language technology
service platform can be the basis for uniting LT providers, language service
providers, researchers, and providers of other services, citizens and corporate
users.
Andrejs
Vasiljevs, Tilde, presented the SRA priority theme “Translation Cloud”. Many
applications needed by EU citizens and businesses require specific or generic
translation services: eCommerce, cross-language subtitling, education etc. The
translation cloud will be a ubiquitous online platform to provide these
services, including various methods like machine translation or automatic
language checking, for usage in and delivery to many devices. This will have
huge impact, like facilitating job opportunities and creating new business
opportunities in the huge global market of language services.
The current
state is promising: more data and tooling for machine translation is available.
Nevertheless, we still need a research breakthrough in areas like high quality
MT, and research needs to be organized with close integration to the industry.
Marko
Grobelnik, Institut “Jožef Stefan”, presented the SRA priority theme “Social Intelligence and eParticipation”.
He started with a review of various trends, like the importance of language
related technologies in the Gartner hype cycle, increasing time spent on the
social Web, and increasing importance of content aggregators over content
creators, leading to more interlinked content and huge amounts of big data.
From this
review, various recommendations for topics in a technology and research roadmap
emerged: social influence and incentives, information tracking & dynamics,
multimodal data processing, visualization and user interaction, and algorithmic
fundamentals. An important task is now to present these topics to decision
makers and show their relevance for the European citizens and eParticipation.
Joseph
Mariani, CNRS-LIMSI/IMMI, presented the SRA priority theme “Socially Aware Interactive Assistants”.
The aim is to create multilingual assistants which support human interaction,
acting naturally and personalized in various environments, in any language and
anywhere. Global abilities are needed for these assistants, like natural
interaction with agents (e.g. terminals or robots). In addition, there are
domain specific abilities like personalized training in computer aided language
learning.
The roadmap for
this priority theme encompasses these global and domain specific aspects, and
the creation of language resources and evaluation tasks. Other countries (e.g.
the
The LT-Innovate Innovation Themes
Key participants
of LT-Innovate presented aspects of the “innovation themes” which are under
development.
Rubén Riestra, INMARK International Area,
provided a general introduction to the envisaged “LT innovation agenda”. The
aim is to produce a vision statement how innovation should enable LT providers
to deliver value, that is: new products and services for the digital single
market. LT-Innovate has identified five main “innovation clusters”:
iEnterprise, iHealth, iHelpers, iServices, and iSkills.
Rose Lockwood , INMARK International Area,
presented the approach for writing the innovation agenda. The aim was to create
a consolidated view of the software market and the potential “LT market”. This
should also include a commercialized LT view that will influence both LT
companies and the research community. LT-Innovate has tracked LT related news
intensively, leading to the five innovation clusters.
Philippe Wacker, EMF, emphasized the importance
of innovation for getting
Paul Welham, CereProc Ltd., presented
findings from a panel discussion at the LT-Innovate summit about language
technology for people with disabilities and special needs. The aging population
creates many challenges, but it also leads to many opportunities for language
technology applications. An example is avatars to support communication of
elderly people.
Claude de Loupy, Syllabs, presented
opportunities for user and product analysis. Language technology can create
more value in areas like eCommerce or the travel industry. This industry is
huge in
Jochen Hummel,
Adriane Rinsche, Language Technology Centre
Ltd., presented promises of language technology in the health care market.
Language technology can help to save costs and improve services, e.g. for
patient related information management or health monitoring. There are also
multilingual aspects like medical information in tourism. Language technology tools
that interface easily with each other and medical infrastructure will lead to
excellent opportunities in this market.
Jochen Hummel,
Discussion
The joint
session between META-NET and LT-Innovate was wrapped up by a short discussion.
One topic was the gap between what language technology already can achieve, and
the needs of the end user. Some types of language technology are getting more
and more uptake, e.g. speech interfaces. But wide spread adoption is yet to
come. The overall usability of language technology has to become a focus of
efforts, or in different words: we have solutions, but what was the problem?
Nicoletta
Calzolari, CNR, and Georg Rehm, DFKI, chaired the “LT Fireworks”
session. Georg Rehm briefly introduced the background of the META Seal of
Recognition awards and the META Prize: these awards are given annually at the
META-FORUM event, and winners are chosen by the META Technology Council: around
30 experts of the European LT landscape who provide the main input to the Strategic
Research Agenda (SRA).
Alessandro
Tescari received the seal of recognition for Pervoice. Pervoice provides speech
recognition using large vocabularies and handling multiple languages for
specific sectors. Solutions based on Pervoice include a remote transcription
system, transcription workflow and subtitling solutions.
Siegfried
Kunzmann received the seal of recognition for European Media Lab. The EML
transcription platform helps to bring automatic transcription to various
markets. One important usage scenario is the automatic transcription of
voicemails to SMS, e-mail or mobile devices.
Jakub
Zavrel received the seal of recognition for Textkernel. The Extract! and other
Textkernel products use language technologies and machine learning for extraction
of information in CVs. This saves time in processing CV’s into recruitment
systems and eases aggregation of searchable information.
Heidi
Depraetere, on behalf of Paraic Sheridan, received the seal of recognition for
IPTranslator created within the PLuTO project. PLuTO is developing an online
translation solution for patent translation. It helps the patent researcher to
decide quickly whether a text in a foreign language is relevant for a given
topic.
Bernardo
Magnini, on behalf of Marcello Federico, received the seal of recognition for FBK.
Here the IRSTLM toolkit for statistical language models is been developed. It
provides a variety of features for creating languages models, is integrated
e.g. into the MOSES platform, and has been used in various industrial
applications.
Radu
Soricut received the seal of recognition for SDL. SDL's machine translation system
eases access to language pairs, integration with customer systems or control
over corporate terms and brandings. High quality translation results can be
delivered across 30 languages via post editing.
Radim
Kudla received the seal of recognition for PHONEXIA s.r.o. PHONEXIA provides
speech technologies for identifying various pieces of information from speech,
e.g. different speaker, gender, language, keywords, transcription etc. The
technologies are applied for example in multilingual speech transcription and
keyword spotting systems.
Kirti
Vashee and Dion Wiggins received the seal of recognition for
Asia Online. Initially Asia Online focused on using machine translation for
bringing English content into Asian languages. The scope then was extended to
various domains and language pairs. Now also language pairs involving Asian and
European languages are being included.
Joseph
Mariani, on behalf of Bernard Prouts, received the seal of recognition for
Vocapia. Vocapia has created VoxSigma, a software suite with large vocabulary
speech-to-text capabilities. VoxSigma has been developed for transcribing large
quantities of audio and video. It is used in many applications like media
monitoring or speech analytics.
Tony
O’Dowd received the seal of recognition for Xcelerator. KantanMT developed by
Xcelerator is a cloud based machine translation system. It is based on the
Moses platform and provides machine translation to mid-sized language service
providers. KantanMT responds to the need of high-quality and low-cost machine
translation.
Lauri
Karttunen received the seal of recognition for XFST developed within Xerox. XFST is
a finite-state toolkit for text processing, e.g. rewriting, tokenization or
morphological analysis. Since 1993, it has been used for dozens of languages
and in large cooperation. The source code of XFST is planned to be available
soon under an open source license.
The members of
the META Technology Council decided that the scope of the META Prize 2012
should be “Outstanding products or services supporting the European
Multilingual Information Society”. There have been 19 nominations, and one
clear winner: The prize was given to the JRC Optima Activity for Language
Technology, represented at META-FORUM 2012 by Erik van der Goot.
JRC, the Joint
Research Centre, is an EC’s in-house science service. One major application
developed within JRC is the Europe Media Monitor (EMM). Starting in 2002, today
EMM processes 150.000 new news articles - per day and in 50 languages. The
articles are classified according to hundreds of subjects and countries.
JRC also has
created language resources of enormous value, e.g. multilingual parallel corpora
in 22 languages, multilingual multi-label categorisation software, and the
multilingual named entity resource JRC-Names. These resources and EMM itself
are of high importance for multilingual information gathering.
Kimmo
Rossi, the European Commission, DG for Communications Networks, Content and
Technology (CONNECT), gave the opening talk for the first session on the second
day. He presented the current state of planning for two new programs: “Horizon 2020” and “Connecting
Europe Facility (CEF)”.
In Horizon 2020,
language technology is planned to be part of the industrial leadership topic
with dedicated funding instruments for SMEs. Relevant topics are related to
content technologies and information management, e.g. the creation of tools for
handling content in any language, or modelling, analyses and big data
visualization.
CEF, different
to Horizon 2020, is not about research or innovation, but infrastructure.
Digital service platforms in areas like eGovernment or eHealth are to be
developed. Language technology comes into play via the requirement for
multilingual access to online services. A core platform should provide basic
language technology building blocks for free, accompanied by various generic
services like machine translation.
Roberto
Cencioni, the European Commission, DG for Communications Networks, Content and
Technology (CONNECT), gave a presentation about “Final 2012/2013 calls in FP7”.
Themes in these
calls include global content processing, mining of unstructured information and
natural interaction. There are two calls, one dedicated to language, one esp.
for SMEs including the areas of language and handling of big data.
Three research
lines are formulated in the language related call: analytics, focusing e.g. on
the interplay of text, speech, audio and video; translation, aiming at high
quality MT; and interaction, with the goal to integrate processing of speech
and additional modalities in ITC platforms. In addition there are roadmapping
actions, which should target specific sectors, common tools, data sets &
standards, integration and evaluation.
The SME call has
a focus on analytics and open data. There are project lines for the re-use of
open data, transfer and uptake of LT, and software focusing on open data and
its applications.
Stelios
Piperidis, ILSP, started the session on the open resource exchange infrastructure
META-SHARE with a presentation entitled “Overview,
Current State, Towards Version 3 of META-SHARE”.
Language
resources (LR) are needed everywhere in language related technology. META-SHARE
is a network of distributed repositories (so-called “nodes”) for sharing and
exchanging LRs, aiming to match LR providers and consumers.
In META-SHARE,
LRs are described via a dedicated metadata schema. It supports all services of
the infrastructure like storage, browsing, or metadata harvesting. The metadata
schema describes the LR itself and provides also additional information,
related e.g. to licensing.
Such metadata is
important for the legal framework used in META-SHARE. Various licensing
templates are provided. They encompass a mix of open and openness inspired
models.
In the coming
months the META-SHARE software will be improved in various areas like search
engine optimisation or data migration. More META-SHARE nodes will be created,
and ELRA supported initiatives will be included, to achieve full deployment of
META-SHARE from ELRA and its members.
Tamás
Varadi, Research Institute for Linguistics,
One major aim is
to contribute resources for these languages to META-SHARE. This encompasses monolingual
corpora as well speech corpora, lexica or language technology tools. In
addition, cross-linked resources between the six languages (e.g. multilingual
parallel corpora) have been developed. A long-term perspective behind these
efforts is important: CESAR is going to set up a META-SHARE repository / node
for hosting these languages resources.
Antonio
Branco,
The project also
contributed to the development of META-SHARE, which was the focus of the
presentation. This includes among others input to the metadata model, legal or
licensing aspects, and various technical areas.
The repositories
/nodes have been populated with resources by METANET4U. Seven nodes have been
set up. 100% of the resources that are available via these nodes are new, that
is they have not been available via other distribution channels before. A
future topic is the interoperability between META-SHARE and other platforms.
Andrejs
Vasiljevs, Tilde, gave a presentation entitled “The contribution of META-NORD”.
META-NORD covers the Baltic languages (
The focus of the
contribution to META-SHARE was European languages with less than 10 million
speakers. As the analysis in the language white paper series reveals, for many
of these languages the amount of high quality languages resources is very
limited.
META-NORD worked
on filling gaps especially in the areas of WordNets, treebanks and terminology
resources. Like in the other projects, which presented contributions to
META-SHARE, the sustainability of the repositories is of high importance, and
META-NORD has committed to provide support at least for a given time frame.
META-SHARE in
2013 and beyond – Q/A and Panel Discussion
The Q/A
and Panel Discussion first focused on concerns about the future of
META-SHARE. what will happen when the underlying projects come to an end? ELRA
and others involved have committed to guarantee at least for two years and
probably for longer that META-SHARE will receive support.
Another topic
was the role of META-SHARE with regards to high quality language resources.
META-SHARE is not a means to create these resources, which are needed by the
SMEs constituting the majority of the language technology industry in
Various
questions were about licensing. META-SHARE has been set up also to become
attractive to the open source community. To this end, META-SHARE provides the
necessary licenses. Nevertheless, the language technology community itself has
expressed the need for restricted licenses. In this respect, the META-SHARE
licensing options reflect the current thinking of the community.
Presentations
Károly
Gruber, Hungarian Ambassador to
Diana
Popova, Senior expert, Science Directorate, Ministry of Education, Youth and
Science, presented the situation in
Bulgaria. Language technology is part of the ICT vertical research. Here it
has received funding since 20 years ago. Nevertheless compared to other
countries the level of funding is still low.
Karel
Oliva, member of the Council of Research, Development and Innovations of the
Edouard
Geoffrois, Ministry of Defense and French National Research Agency, presented the situation in France. Various national
agencies cooperate to support language technology related topics. There are
large, dedicated programs like Quaero and programs run in cooperation with
other countries.
Alice
Dijkstra, The Netherlands Organisation for Scientific Research (NWO), presented the
situation in the Netherlands. A
joint Dutch and Flemish program for language technology that lasted 2005-2012
will have no successor. Nevertheless, language technology can be funded via an
“LT inside” approach. It can be part of other themes like the humanities or the
creative industry. In addition, funding as part of infrastructure programs can
be acquired rather easily.
Simona
Bergoč, Department for Slovene Language, Ministry of Education, Science, Culture
and Sport, presented the situation in
Slovenia. Language technology activities in
Joseph
Mariani, CNRS-LIMSI/IMMI, presented the European Commission's Collaborative Research Instruments.
Member states and the EC need more coordination. From the various existing
coordination instruments, the “Article 185” seems to be well suited for
language technology. The 2008 European Council Resolution on “European strategy
on Multilingualism” provides important arguments towards policy makers for the
development of language technology in
Panel
Discussion
The panel
discussion brought up mainly two questions: what are the benefits of
coordinated programs on the European level, and what is the best approach to
create them.
As an answer to
the first question, several national projects that targeted similar goals were
mentioned. Running such projects without coordination leads to duplication of
efforts, and basic tasks like data sharing are hard to achieve. The result is
that critical mass compared to other regions in the world is hard to achieve.
Pushing for
dedicated funding on the national or the European level requires both a bottom
up approach, involving the leading experts in the field, as well as a
political, top down approach. A major argument towards politicians is that
multilingualism is the crucial asset of
Fernando
Pereira, Google, gave the closing keynote of META-FORUM 2012, entitled “Low-Pass Semantics.”
At Google, a lot
of effort is put into natural language processing. Nevertheless, the aim is not
to achieve automatic sophisticated processing for small pieces of content, but
to develop language technology workflows that scale to the whole web. Here, the
web services both as a data source and as target content.
The presentation
exemplified this approach with “Low-Pass Semantics”: its aim is to create links
between natural language text, external knowledge bases like the so-called
“knowledge graph” and other types of data.
Web pages often
contain useful pieces of information, but they are hard to identify. The external
knowledge basis contains keys or identifiers of concepts. In the low-pass
semantic approach, these are linked to the text. This improves consistency in
interpreting Web content.
The motivation
for the approach described is not a research topic, but a user problem: the low
precision of Web search. Methodologies from natural language processing play an
important role. Grammar parsing or named entity recognition NER, applied in a
robust and scalable manner, help to create better linkage to the knowledge base
than pure matching of text patterns. But language technology alone is not
sufficient: For web scale, computational power is extremely important, more
than advancement of algorithms.
Hans
Uszkoreit, DFKI and META-NET, summarized in a brief closing session the next steps
for the language technology community in
The following
months will decide about the shape of language technology, including the
financial support provided in
There is a lot
of competition with other research fields – language technology is just one of
them. If the community wants to assure support in the future, it needs to
spread out widely with a positive message. In addition to the SRA, next year’s
META-FORUM 2013 will be one main instrument to convey that message to
everybody.