Personal tools
You are here: Home

The project has received funding from the Competitiveness and Innovation Framework Programme under Grant Agreement n° 271022.

Latest news

CESAR final project review meeting


On 10 April, 2013 the outcomes of the CESAR project were discussed and reviewed at the Final review meeting in Luxembourg. All WP leaders held a presentation on the work done in their particular WP-s and tasks. All relevant questions of the reviewers were discussed and all strenght of the project were presented.





The Hungarian Language in the Digital Age - successful dissemination


The Hungarian Road Show of the CESAR project called ‘The Hungarian Language in the Digital Age’ was held on the 18th of January, of which name refers to the Language White Paper. One of the aims of the conference was to acquaint the political and public sphere with the situation of the Hungarian language apart from the close trade. Therefore we were very honoured to host at the opening ceremony several ministers and state-secretaries, who gave speeches or were represented besides the managers of the CESAR. For instance, the Minister of National Development, the Minister of State for Economic Planning, the Minister of State for Culture, the State Secretary, and the Ministry of Foreign Affairs.

In the morning section the CESAR project and the situation of technology of the Hungarian language received a leading role. Besides the speech of Georg Rehm and Tamás Váradi, the most renowned representatives of language and speech technology, sketched the technological support of the Hungarian language and its perspectives. The afternoon section was aimed to present the Hungarian results so far, that is, the Hungarian National Corpus, the HUCOMTECH project and the Big Data project. The attendances could obtain further information of the researches and works of language and speech technologies, which are currently in progress in Hungary.

Beside the presentations, various demo and poster presentations showed the effort of language technology in Hungary.

ng mnr.jpg




CESAR Road-Show in Hungary

'The Hungarian Language in the Digital Age – Hungarian Language and Speech Technologies Day' – 18th January, 2013, Budapest, Main Building of the Hungarian Academy of Sciences



The last awareness raising road-show of the project supporting the results of the LT field will be held by the coordinator of the project in Budapest, on the 18th of January 2013. The prestigious event is dedicated to the key players of the Hungarian language technology as well as to promote the outcome of the CESAR project itself.


The event will be organized under the patronage of the President of the Hungarian Academy of Sciences and will be pleased to present speeches of key players of the EC and LT projects across Europe (Mártha Nagy-Rothengass, Georg Rehm, Ralf Steinberger, Steven Krauver).

Further information of the conference (as well as the programme) can be found here.



CESAR @ Annual Day of the Faculty of Mathematics, UBG, Serbia


CESAR, was presented at the Annual Day of the Faculty of Mathematics, December 21st, 2012.


DanFakulteta01.JPG  DanFakulteta02.JPG





CESAR project was presented on COLING 2012, the 24th International Conference on Computational Linguistics, organized by the Indian Institute of Technology Bombay (IITB), at IIT Bombay (Mumbai, India), from 8–15 December 2012. The results of the project were shown on the demo presentations held by Tamás Váradi and Marko Tadic.

Marko Tadic presented the the project itself pointing out the relevance of participation in META-SHARE (Central and South-East European Resources in META-SHARE) and Tamás Váradi on one of the most interesting tools of the project, NooJ (Open source multi-platform NooJ for NLP).







CESAR Road-Show in Croatia – 'Language Technologies Day' – 30th November, 2012, Zagreb, Hotel Sheraton


We invite you to Language Technologies Day, conference organized Under the patronage of the President of the Republic of Croatia, Professor Ivo Josipović.

The conference will collect all relevant stakeholders in the field of Language Technologies in Croatia with invited lecturers from European Commission, Institute for Language and Speech Processing (Athens), Hungarian Academy of Sciences (Budapest), as well as domestic experts, policy-makers and industrial partners. Within this conference new Language Technology EC project call will be presented, the results of the META-NET network of excellence and CESAR projects as well. The presentation of the Language White Paper The Croatian Language in the Digital Age is also part of the programme.

Further information of the conference can be found here.



Serbian Road Show – news in brief


CESAR Road Shows were continued with the presentation of the Language Technology Day in Belgrade (Serbia) organized on 29th October, an occasion dedicated to the presentation of the current language technology landscape in Serbia. Professionals of the LT field from Serbia and across Europe showed the extreme importance of the applications of language technologies in our digital era, and presented the state of the art opportunities and challenges in this field.

The Language Technology Day was devoted to the results of CESAR project enhanced in the Serbian language. The conference was dedicated to the improvement of the project and the META-NET alliance, with aim to present the current situation of the field in Europe.

The presentations and demonstrations were focused on the importance and broad coverage of the technology support of the Serbian language, while the conference was focused on the promotion of the knowledge about language technology and its possible potential in Europe.

Not only presentations, but also demonstrations of the most relevant informatics support of processing of Serbian language were presented to promote the relevance of the language technology and the importance of less used languages in Europe.

The Language Technology Day was a relevant milestone in the lifetime of the CESAR project and expressed the importance of the aim of CESAR and META-NET.


 GR  cesar_crowd



CESAR Road show in Serbia, October 29 – Belgrade, Hotel Hyatt Regency


We are pleased to invite you to the forthcoming CESAR dissemination event organized in Belgrade. The road-show (Human Language technology Day) will focus on various aspects of CESAR and on current trends and developments in Serbian language technology.

If you are interested in the presentations and demos (please see the preliminary programme of the event in general outline), please send back the registration form by 25th October, 2012 to the following address:



Campaign on Language White Paper


One of the results of the collaborative effort of CESAR partners was the report on language situation of the covered languages, called Languages White Paper (published at Springer). A summary of the work done can be found in one of the project deliverables. The promotion campaign of the series has already been started and as a result news were published in central on-line and other media.

Pieces of the Interesting media campaign can be read in proper languages here:



CESAR @ FASSBL in Dubrovnik/Croatia, September 19–21th, 2012


At the 8th FASSBL conference CESAR project was presented with several papers, namely:

  1. Nikola Ljubešić: Comparing Traditional and Web Corpora of Croatian
  2. Danijela Merkler, Željko Agić, Marko Tadić: Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search
  3. Krešimir Šojat, Matea Srebačić, Vanja Štefanec: Derivational Patterns of Croatian Verbs
  4. Cvetana Krstev, Duško Vitas, Milos Utvić: Derivational Patterns in E-Dictionaries of Serbian


This bi-annual conference held from 19th to 21st Sept. 2012 in Dubrovnik, Croatia, encompasses contributions covering South-Slavic and Balkan languages treated with formal linguistic and computational approaches. Two CESAR partners were present: FFZG and UBG.


Željko Agić presenting a paper Automatic Enrichment of Croatian Morphological Lexicon Using Large Corpora and Web Search



CESAR @ Slavic Parallel Corpora Workshop in Mainz/Germany, September 10–11th, 2012


CESAR was presented at the Slavic Parallel Corpora Workshop in Mainz, 10-11 Sept. 2012 by Marko Tadić and Nikola Ljubešić. Beside the general CESAR and META-* presentation, two Croatian resources developed and enhanced within the CESAR project have been shown.
At this workshop two more CESAR partner institutions, namely, LSIL (Radovan Garabik) and IPI-PAN (Adam Przepiórkowski) also participated.



CESAR @ TSD, Brno/Czech Republic, September 3–7th, 2012


Cesar was presented at the ''15th International Conference on Text, Speech and Dialogue' by Tamás Váradi. An other project (Supervised Clustering of Prosodic Patterns in Spontaneous Speech – financed from CESAR) was shown on poster by György Szaszák (BMET-TMIT) and András Beke (HASRIL).


varadi_tsd           szaszak_beke_tsd






CESAR Road show in Poland, September 27–28th, 2012


I am very pleased to invite you to join the international event “Human Language Technology Days 2012” to be held on September 27-28, 2012 in Warsaw. The event is organised by the Institute of Computer Science, Polish Academy of Sciences and the University of Łódź in the frame of ICT-PSP project CESAR (Central and South-East European Resources, part of META-NET) and is colocated with ICT Proposers’ Day 2012.

Language Technology Days 2012 intends to promote knowledge about language technology, its potential, but also possible threats accompanied with its development, by gathering experts who will try to answer important questions on the future of language and language processing in a globalized digital information society. How does massive digitization of information, knowledge and everyday communication affect our language? Will our language change or even disappear? Will Internet always be divided by the languages of its users?

Another important topic we intend to cover is a Road Show of European language technology, aiming at presenting state-of-the-art, directions and visions of development of language resources and tools for the common scientific and commercial market. Particular attention will be drawn to language technology for Polish by presenting new advances in the field and their application in administration and business as well as involvement of open source and research community in the process of development of language resources.

I hope that you will find our proposal worth of notice and you will attend the conference and the follow up discussion (please see the preliminary programme of the event in general outline).


I sincerely invite you to register for HLT Days 2012 and look forward to seeing you in Warsaw!

With kind regards,

Prof. Jacek Koronacki
Director of the Institute of Computer Science


Slovakia Road Show - news in brief


Bratislava was in June 2012 a host city of the two-day scientific and informational conference, organized by the Slovak National Corpus Department Ľ. Štúr Institute of Linguistics of the Slovak Academy of Sciences (SAS) in the scope of CESAR project which is aimed at mobilization of national industry and research and enhancing support for language technologies and tools at the national level.

At the press conference, the invited quests stressed the importance of language technologies in the multilingual European society. Slovak NLP research received full support of Ľ. Falťan (Vice President of SAS), M. Cimbáková (General Director – Science and Technology Division of Ministry of Education, Science, Research and Sport of the Slovak Republic) and P. Žigo (Director of Ľ. Štúr Institute of Linguistics SAS).

The talk by G. Rehm (META-NET manager from DFKI GmbH in Berlin) was dedicated to fostering the technological foundations of a multilingual European information society. On one hand T. Váradi (CESAR project coordinator from Research Institute for Linguistics, Hungarian Academy of Sciences in Budapest) emphasized the current size of the Slovak National Corpus, but on the other he pointed out the weak or no support for text analysis, speech analysis or machine translation.

National, parallel and several specialized corpora were presented by F. Čermák, L. Dimitrova, R. Garabík, J. Hajič, L. Iomdin, T. Pintér, V. Stoykova, M. Šimková and M. Tadić. Applied research of language technologies was presented by NEWTON Technologies, company which provides speech recognition services;, the first catalogue search engine in the Czech Republic; and Education@Internet, international non-profit organization supporting intercultural learning.

D. Katuščák introduced project about the digital library and archive and its possibilities to use the digital text content for linguistic research. L. Hluchý, M. Rusko, J. Staš, D. Hládek and J. Juhár informed of the computing technologies and tools for speech and text processing for Slovak. Building large corpora and tools for computer lexicography were presented by K. Pala and P. Rychlý. Cartographic processing of the Slavic dialects was introduced by P. Žigo. J. Kravjar spoke about the national corpus of theses with the system for detecting plagiarism.

At the international conference information on the current state of language technologies in Slovakia were provided and new trends and visions of development of language technologies were presented. The experts together with stakeholders and general public shared the latest knowledge and expressed most wanted requirements and co-operative ideas in the respective field.



CESAR Road show in Slovakia, June 7–8 - Bratislava, Hotel Park Inn Danube

This road-show event, organized by CESAR (Central and South-East European Resources) will be aimed at mobilisation of national industry and research in the field of language technologies. CESAR together with META-NET (a Network of Excellence consisting of 57 research centres in 33 countries and funded by the European Commission) stimulates multilingual technologies development and research. We believe the road show is the most appropriate mean of enhancing awareness about the human language technologies in European countries.

The meeting is held in conjunction with the international scientific and informational conference “Development of the Human Language Technologies and Resources in Slovakia and in the World”, organised at the 10-year anniversary of the establishment of the Slovak National Corpus, the leading Natural Language Processing research institution in Slovakia. The conference is aimed at national language corpora, specialised corpora, computational lexicography results, development of corpus construction and usage tools, and natural language processing.

The event will bring together different communities – research, business and government, with representatives of research centres, technology, users of language technologies and policy makers responsible for supporting research and innovation.

This CESAR road-show will summarize the current state of language technology in Slovakia and inform the target audience with the state of art and future natural language processing technologies and tools in the cross-lingual European information society.


Programme of the event



CESAR Road show in Bulgaria, May 2, 2012 - Sofia, Hotel Sheraton

An international event on powerful Language Technologies for the multilingual European information society


During the last 60 years, Europe has become a distinct political and economic structure. Culturally and linguistically it is rich and diverse. However, from Portuguese to Polish and Italian to Icelandic, everyday communication between Europe's citizens, within business and among politicians is inevitably confronted with language barriers. ...Language technology and linguistic research can make a significant contribution to removing the linguistic borders. Combined with intelligent devices and applications, language technology will help Europeans talk and do business together even if they do not speak a common language.

META-NET Language Whitepaper Series


The meeting is organized by CESAR, part of META-NET (Multilingual Europe Technology Alliance), a Network of Excellence consisting of 57 research centres in 33 countries and funded by the European Commission. The four projects within the META-NET – T4ME, CESAR, META-NET4U and META-NORD – will provide access to many Language Resources and Language Tools for many European languages. The event in Sofia is the first in a series of official presentations of CESAR in a number of European countries.

The event will inform the participants and general public about the state of Language Technologies for Bulgarian in comparison with other European countires and the importance of multilingual resources and their computer processing for community development, education, business and international relations.

The event will bring together three communities – research, business and policy – with representatives of research centres, small and large technology corporations, translation services and other users or producers of Language Technology, language communities and societies, and policy makers responsible for supporting research and innovation, economy and ICT.

At CESAR META-NET roadshow, some of the foremost European Language Technology scientists and European Commission representatives will summarize the state of the art, disclose new breakthroughs and share success stories about European research. Representatives of large Language Technology users will speak about the benefits of language technology applications, and they will present their show cases and projected future needs. An industry exhibition that concurrently runs with the main conference will feature presentations and demonstrations from large and small businesses working across the field of language technologies and will showcase recent R&D results by EU-funded projects that contribute to the building of multilingual European information society.


Programme of the event


tamas_varadi   hans_uszkoreit  




Natural Language Processing seminar in Warsaw

On January 9th, 2012 Polish parallel corpora and multilingual  resources  made available in the CESAR project were presented by Piotr Pęzik (Universiy of Łódź) during the open Natural Language Processing Seminar in Warsaw, Poland to an audience of some 40 participants. One direct outcome of this presentation was an offer from representatives of research centers attending the seminar (including University of Warsaw ) to contribute their multilingual resources to the CESAR/META-NET pool of language resources.



Call for Hungarian resources and tools


A call for Hungarian resources and tools to be distributed in META-SHARE was announced by Tamás Váradi at MSZNY2011 on December 1.

If you are interested please fill in the online form. The pieces of information that we particularly need: a short description, the planned action (e.g. annotation cleanup, standardization), and the planned effort (in person month).

Submission deadline: January 6, 2012





The Eighth Conference on Hungarian Computational Linguistics was held in Szeged, Hungary, on December 1-2.

The founder of the conference series and also the host of the event is the University of Szeged, Department of Informatics. The main aim of the conference is to provide a forum for the presentation of the most recent results and achievements of research and development activities conducted in the field. Apart from delivering the most up-to-date information about the work in Hungarian HLT, the event provides an excellent opportunity to build partnerships and discuss related questions with other professionals in person.

CESAR was presented at the 8th MSZNY, with a presentation by Tamás Váradi. At the same time a call for Hungarian resources and tools to be distributed in META-SHARE was announced.

Tamás Váradi at MSZNY2011




The 5th Language and Technology Conference (LTC'11), a meeting organized by the Faculty of Mathematics and Computer Science of Adam Mickiewicz University, Poznan, Poland in cooperation with the Adam Mickiewicz University Foundation, took place on November 25-27, 2011.


Since very beginning the meetings of the LTC series continue to address Human Language Technologies (HLT) as a challenge for computer science, linguistics and related fields. Fostering language technologies and resources remains an important mission in the dynamically changing information-saturated world.


CESAR was heavily represented at the 5th LTC, with three presentations (Detecting Gaps in Language Resources and Tools in the Project CESAR - Marko Tadić, Parallel and spoken corpora in an open repository of Polish language resources - Piotr Pezik, Orwell's 1984 – the Case of Serbian Revisited - Cvetana Krstev) and a Demo session – CESAR resources in META-SHARE repository.



Marko Tadić


   Piotr Pezik                                                 Cvetana Krstev



cesar demo_ltc11

Tamás Váradi (left)


Public annual report


We have prepared the annual public report of the CESAR project. Which will be published at the Cordis webpage and also can be found on our webpage.



Review about Cesar and META-NET


Tamas Varadi wrote a review about Cesar and META-NET projects for the journal Infotheca that is published as bilingual Serbian/English and is available on-line.



META-NET Network Meeting and General Assembly


On Friday and Saturday October 21st and 22nd last META-NETizens from all across Europe came together in Berlin for the first META-NET Network Meeting and General Assembly. In total 93 participants attended the meetings representing every node in the META-NET Network of Excellence incorporating four European projects, T4ME, CESAR, METANET4U and META-NORD.

For many this meeting was the first time they met their colleagues from other partner organisations in the Network and so, in addition to the agenda items, it proved a useful opportunity to meet each other and discuss current work and new ideas.

The agenda included updates on the current state of play and next steps in the overall META-NET initiative, each of the partner projects, as well as an update and discussion about rolling out META-SHARE. We also engaged in extensive discussions about the forthcoming publication of the Language Whitepapers. Cross language comparison and clustering with respect to tools and resources was the hot topic for discussion here. But thanks to the strong representation from all partners the issues were thrashed out and tough decisions were taken to move the papers forward. There were also some lively discussions around the visions presented in the Strategic Research Agenda being drafted by the Technology Council. Then followed some discussion on cooperation on horizontal issues across work packages and projects and some closing remarks from Hans Uszkoreit to bring proceedings to a close.

The first META-NET Network Meeting and General Assembly was a fruitful and engaging meeting for all. The many interactive sessions gave each partner a chance to get involved and help shape the work of the Network and the informal discussions amongst partners helped strengthen the close working relationships in the Network. We’re looking forward to acting on all this to further the META-NET cause.




Dusko Vitas gave a talk at 10th National Conference "New Technologies and standards: digitization of national heritage" that took place in Belgrade on the 22-23 September 2011. His talk was about the "Serbian Language and its Resources".





41st International Slavistic Conference


Dusko Vitas gave a talk at the 41st International Slavistic Conference that took place in Belgrade 2-16. September. His talk was about "Language Resources". At the same event Milos Utvic had a presentation on "Serbian Language Corpora".



Presentation on the Translation Management Europe conference


On 29th of September 2011 the CESAR project and the META-NET network was presented by Piotr Pęzik from University of Łódź at the Translation Management Europe conference held in Warsaw, Poland. The audience included more than 150 translation/localisation industry professionals and company representatives, including SDL, ATRIL, Kilgray, Memsource and others. Piotr Pęzik's talk was focused around the translation and localisation resources made available through CESAR/META-NET. A more detailed programme of the event can be found here.



European Day of Languages


26 Septembeeuropdayr 2011 marks the 10th anniversary of the European Day of Languages (EDL) celebrated at the Council of Europe and throughout its 47  member states.

The general objectives of the European Day of Languages are:

1. Alerting the public to the importance of language learning and diversifying the range of languages learnt in order to increase plurilingualism and  intercultural understanding;
2. Promoting the rich linguistic and cultural diversity of Europe, which must be preserved and fostered;
3. Encouraging lifelong language learning in and out of school, whether for study purposes, for professional needs, for purposes of mobility or for pleasure  and exchanges.

The members of the CESAR project (in a close connection with META-NET) published a series of Language White Papers (LWP) focusing on the social, economical and technical standing of 30 European languages.

The series of LWP can be found under



EUROLAN 2011 Summer school


eurolanOn August 24 – September 4 the EUROLAN 2011 Summer school , the venue was Cluj-Napoca, Romania, in the heart of Transylvania, provided one week of intensive study of the natural language processing technologies currently under development to support industrial applications. Internationally known scholars, researchers (with the particular involvement of scientists from the Multilingual Europe Technology Alliance – META), as well as industrials involved in leading-edge work in innovative areas of natural language processing gave lectures at the school (tutorials, hands-on labs and demos) to share with students in-depth understanding and experience.


Tamás Váradi, the project coordinator of CESAR was giving a lecture on Practical settings for NOOJ. The tutorial has given an overview of the finite state linguistic analysis tool NooJ. The system provides a comprehensive linguistic development environment, it has integrated corpus handling facilities, coupled with a morphological lexicon and parsing through a series of cascaded local grammars. NooJ language modules (minimally, inflecting lexicons and some sample grammars) have been developed for a wide variety of languages.



META-FORUM 2011: Videos of Presentations


metaforum At the end of June 2011 our annual conference META-FORUM took place in Budapest, Hungary. Video lectures of the conference can be found at (including the presentations of the CESAR project).

Document Actions