Morphological Analysis of Hungarian in NooJ

 

Peter Vajda (Research Institute for Linguistics, Hungarian Academy of Sciences) vajda@corpus.nytud.hu

 

 

Abstract

 

Hungarian being a highly agglutinative language poses a non-trivial problem for its morphological analysis. Unlike English or French it has about 700 nominal forms and about 100 verbal forms, which make it implausible to store all the word-forms in a dictionary. Furthermore there are stems that can have variants "triggered" by some of the numerous suffixes. We have thus, using an existing description of Hungarian morphology, created a grammar based on morpho-phonological features. We made use of NooJ's morphological graphs for recognizing the suffixes and producing the corresponding lexical constraints which could have been checked against the various possible forms in the dictionary. We have designed a method for recognizing derivational suffixes, that can be followed by inflectional ones. The existing dictionary had to be transformed as well to one acceptable for NooJ, by introducing the neccessary morpho-phonologiacal features.

We have also created an alternative solution, in which we used the existing paradigmatical description as it is, using only dictionaries and no grammar.

The resulting dictionaries contain all the possible sequences of suffixes and all the stems ; the correspondance between the two is achieved by the using the names of the paradigm-classes as constraints.