ILI RAS Institute for Linguistic Studies
Russian Academy of Sciences
main events departments materials links
education personalia bibliography periodicals
library history references vacancy
на русском


Conference “Corpus-Based Approaches to the Balkan Languages and Dialects”

5–7 December, 2016
Institute for Linguistic Studies of the Russian Academy of Sciences (Saint Petersburg, Russia)
Organizers: Alexander Yu. Rusakov, Maria S. Morozova, Maxim L. Kisilier


In the last decades digital corpora of the Balkan languages have been developed by scholarly institutions and universities in the Balkans and worldwide. For the time being, Balkan linguists can make use of a range of projects, such as the Bulgarian National Corpus (developed by the Bulgarian Academy of Sciences), BCS Gralis Corpus (developed in the University of Graz, Austria), Croatian National Corpus (developed in Zagreb University, Croatia), Albanian National Corpus and the Corpus of Modern Greek (both developed by the Russian Academy of Sciences), etc. These corpora constitute a very useful tool, which can facilitate various kinds of linguistic research, where unstructured electronic text collections do not suffice or are not available.

One of the conference’s aims is to share experience in developing electronic corpora and interactive databases of the Balkan languages and dialects (corpora of written language, parallel corpora of the Balkan languages, corpora of spoken language, dialect corpora, etc.). Participants are invited to present either completed or on-going projects, and report on theoretical and practical challenges they have already encountered or are likely to encounter when developing a corpus of (a) Balkan language(s) and discuss possible solutions to these problems. Potential domains of inquiry include:

  • structure of corpus (subcorpora, types of texts), selection of texts and their presentation in corpus (transcription, translations, use of standard orthographies);
  • development of linguistic (morphological, syntactic and semantic) annotation standards and metadata descriptions.

Another major aim is to present case studies facilitated by existing corpora and interactive databases of the Balkan languages and dialects. Participants are expected to share the results of their own corpus-based researches in various linguistic domains. The investigations may include:

  • phonetic, morphosyntactic and lexical research using available corpora of the Balkan languages and dialects;
  • diachronic and synchronic studies with the use of language corpora;
  • the investigations of written language as well as corpus-based studies of spoken language covering various aspects of spontaneous speech analysis and language acquisition issues.

The languages of the conference are English, French, German, and Russian.

Speaking time is 20 minutes, plus 10 minutes for discussion.

Invited speakers

  • Evangelia Adamou (French National Centre for Scientific Research, Lacito Department, Paris)
  • Ruprecht von Waldenfels (Department of Slavic Languages and Literatures, University of California, Berkeley)
    ©2002-2017 ИЛИ РАН;
199053, Санкт-Петербург, Тучков пер., 9, (812) 328-16-12