A new audio-visual encyclopedia which gives a comprehensive overview of Czech – its history, theory and trends, disappearing dialects and typical gestures – is being created at the Faculty of Arts of Masaryk University in Brno. A team of eighty scholars that includes linguists from prestigious foreign institutes such as MIT and Princeton will finish work on the portal at the end of 2015. The project is funded by the Czech Science Foundation and costs almost 350,000 Euros.
Unique in its range, the online encyclopedia will have more than a thousand entries. “The portal, whose first version will be in Czech and second in English, it is not just a better sort of Wikipedia. It will provide both guaranteed information about Czech for the public, teachers, students and journalists and for scholars it will give a comprehensive picture of how Czech has been reflected in various linguistic theories,” says director of the project Professor Petr Karlík of Masaryk University’s Faculty of Arts.
View from outside
Work on the electronic encyclopedia has attracted an unusually high number of linguists from prestigious universities from all over the world – more than a third of team members are not Czech. “Some of them play the Premier League in linguistics. A splendid example is Professor Emonds, one of the first students of Noam Chomsky at MIT,” says Professor Karlík. European linguists involved in the project include participants from the Universities of Regensburg, Naples, Vienna, Paris and Tromsoe; there are also scholars from the American universities Princeton, Brown and Tulane.
As Czech is attractive to scholars for its typological difference from English, many linguistic theories could arise on the basis of Czech. “Czech inflects and conjugates, and it has a relatively free word order. That’s why data from Czech open up new perspectives and stimulate the formation of linguistic theory. It is no coincidence that one of the key theories of language study, Prague Structuralism, was created from Czech data and greatly influenced the development of linguistics not only in Czechoslovakia but also on a global scale,” says Petr Karlík.
Constantly updated
Each section will be updated according to the latest findings and will add more data from the ‘corpora’, which contain hundreds of millions of words and links from contemporary Czech.
The scholars’ work will not end after the conclusion of the project at the end of 2015. “We will take care of the website so that information does not become obsolete. We are also counting with the creation of an English version, which will make Czech scholarship available online to the international academic community as a whole,” says Petr Karlík.
Unique electronic processing
The content of the encyclopedia will be transmitted to experts at the Faculty of Informatics of Masaryk University, who will enter it in an electronic database which will then be accessible to users through web pages. “This online electronic portal will be based on the lexicographical DebII platform developed at the Centre for Natural Language Processing,” says Associated Professor Karel Pala of the Faculty of Informatics. The system consists of two main parts, the first being a server which stores all data of which the encyclopedia is composed. “The second represents the client, i.e. the web interface, through which users can ask questions and seek information they are interested in,” says Karel Pala.
The platform is designed specifically for dictionaries and has a very fast search capacity. A special coding process provides for the characters of Old Church Slavonic, Old Czech, and all world languages. In addition, as the scholars are developing an application directly for the university, not only is it much cheaper, but also it can be better adapted to the specific requirements of linguists who have processed the content of the encyclopedia. “The platform also contains a lexical DebDict browser that provides access to six major Czech dictionaries and other resources, such as the CIA World Factbook,” says Karel Pala.