Tecling logo   Technologies for Linguistic Analysis
»The World is automatic
NEVEN

We present a study in the field of the automatic detection of non-deverbal eventive nouns, which are those nouns that designate events but have not experienced a process of derivation from verbs, such as fiesta (‘party’) or cóctel (‘cocktail’) and, for this reason, do not present the typical morphological features of deverbal nouns, such as -ci´on, -miento, and are therefore more difficult to detect. In the present research we continue and extend the work initiated by Resnik (2010), who offers a number of cues for the detection of this type of lexical unit. We apply Resnik’s ideas and we also add new ones, among them, the inductive analysis of the words that tend to co-occur with eventive nouns in corpora, in order to use them as predictors of this condition. Furthermore, we simplify the classification algorithm considerably, and we apply the experiments to a larger corpus, the EsTenTen (Kilgarriff & Renau, 2013), comprising more than 9 billion running words. Finally, we present the first results of the automatic extraction of eventive nouns from the corpus, among which we find plenty non-deverbal nouns.

Web demo: http://www.tecling.com/neven

Source code: http://www.tecling.com/neven/neven.pl


Usage:

perl neven.pl input.txt > result.htm


Beforehand, you need the contexts of occurrence of a word extracted from the corpus. But you will need to edit the script in order to set the right path to the folder where the contexts are stored. These concordances are stored in a file bearing the same name of the word's lemma. You can obtain these concordances from any corpus using our free corpus concordancer Kwico.
Comments in the script are at the moment only in Spanish.

Pending Work: Users interested only in non-deverbal eventive nouns will need a few changes in the script que filter out those nouns having deverbal morphology (e.g. -ción, -miento). What is interesting about this program is that it completely ignores such morphological cues. The morphology filter is a safe and simple method and will be ready soon.

Funding: This research is supported by a grant from the Chilean Government: Conicyt-Fondecyt 11140686, “Inducción automática de taxonomías de sustantivos generales y especializados a partir de corpus textuales desde el enfoque de la lingüística cuantitativa” (Automatic taxonomy induction from corpora for terminology and general vocabulary using quantitative measures). Lead researcher: Rogelio Nazar.

Related publications: Nazar, R.; Soto, R.; Urrejola, K. (2017). Detección automática de nombres eventivos no deverbales en castellano: un enfoque cuantitativo basado en corpus. Revista Linguamatica, vol. 9, num. 2, pp. 21-31.

Related concepts: computacional lexicography, inductive corpus analysis, non-deverbal eventive nouns

Questions or comments? Feel free to drop a line.

 
      LogoAlt Contact