Kind - The Taxonomy Project

Version: May 17, 2025.

What is this?

Kind is a taxonomy of nouns. The present version is derived from Wiktionaries, for the moment only in Spanish and English.
You are now on the English side. You can select the language ( English or Spanish) and enter any common single noun (or a list of them, one per line) and this system will produce complete hypernymy chains. It does not yet work with multiword expressions: we are currently working on that too.
You can also request a RANDOM SAMPLE of 100 entries

Some more context

This is the new (2025) version of a rather old project. Originally, it was a statistically-based taxonomy induction algorithm from corpus, consisting of a combination of different strategies not involving explicit linguistic knowledge, but based instead on the computation of distributional similarity coefficients. This new version is very different. Firstly, it is simpler than its predecessors, because it is based on a single dictionary (in this case, the Wiktionary, but in principle it could work with any other dictionary as well). Secondly, it has something that previous versions didn't have: a word sense disambiguation algorithm. This new feature gives the system a great advantage, because it helps making more coherent hypernymy chains.

At some point in the near future we will reinstall some of the components of the older version of Kind which were also useful to obtain information from corpora. We will be updating on this soon.

Documentation and source code

Today is Mondal, May 12, 2025. The source code and documentation are changing very rapidly as we are work on the details. However, you can have a look at what we've got so far.
We will not maintain older versions. However, copies of older versions of Kind are available thanks to the great Wayback machine (Internter Archive) at the following URL:

https://web.archive.org/web/20230926170113/http://www.tecling.com/cgi-bin/kind/2021/

That, for example, is the 2021 version, but many others are available there as well.
If you would like to send inquiries you are welcome to do so at rogelio (dot) nazar (at) gmail (dot) com .

Funding

This project has been supported by two successive grants:

Conicyt-Fondecyt 11140686, “Inducción automática de taxonomías de sustantivos generales y especializados a partir de corpus textuales desde el enfoque de la lingüística cuantitativa” (Automatic taxonomy induction from corpora for terminology and general vocabulary using quantitative measures). Lead researcher: Rogelio Nazar. (2014 to 2017).
Ecos Sud-Conicyt Project C16H02 “Inducción automática de taxonomías del español y el francés mediante técnicas cuantitativas y estadística de corpus” (Automatic taxonomy induction from corpora for Spanish and French using quantitative corpus analysis). Lead researcher: Irene Renau. (2016-2019).

Credits

Researchers:

Rogelio Nazar
Irene Renau
Nicolás Acosta
Daniel Mora
Andrea Alcaíno

Also, the following researchers worked on previous versions of this project :

Antonio Balvet
Gabriela Ferraro
Rafael Marín

Related publications:

Nazar, R. (2021). Kind: un proyecto de inducción automática de taxonomías léxicas. Revista Anales de Lingüística, 2(7): 175-201.
Nazar, R.; Balvet, A.; Ferraro, G.; Marín, R.; Renau, I. (2020). Pruning and repopulating a lexical taxonomy: experiments in Spanish, English and French. Journal of Intelligent Systems, vol. 30 num. 1, pp. 376-394.
Nazar, R.; Obreque, J.; Renau, I. (2020). Tarántula –> araña –> animal : asignación de hiperónimos de segundo nivel basada en métodos de similitud distribucional. Procesamiento del Lenguaje Natural, núm 64, pp. 29-36.
Nazar, Rogelio. (2019). El análisis cuantitativo de la coocurrencia léxica en la lexicografía especializada. In Sanmartín Sáez, Julia y Quilis Merín, Mercedes (eds.). Retos y avances en lexicografía: los diccionarios del español en el eje de la variación lingüística. Anejo 10 de Normas. Valencia: Asociación Española de Estudios Lexicográficos.
Nazar, R., Renau, I., Marín, R. (2017). Experiments in taxonomy induction in Spanish and French. Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017), pp. 66-75.
Nazar, R., Renau, I., Marín, R. (2017). Taxonomía automatizada de sustantivos del castellano y del francés: hacia el etiquetado semántico automático multilingüe. In: Sariego López, Ignacio, Juan Gutiérrez Cuadrado y Cecilio Garriga Escribano (eds.), El diccionario en la encrucijada: de la sintaxis y la cultura al desafío digital. Santander: AELEX, 731-745.
Renau, I., Nazar, R. (2017). Verbos en contexto: una propuesta para la detección automática de patrones léxicos en corpus. In: Sariego López, Ignacio, Juan Gutiérrez Cuadrado y Cecilio Garriga Escribano (eds.), El diccionario en la encrucijada: de la sintaxis y la cultura al desafío digital. Santander: AELEX, 879- 897.
Nazar, R.; Arriagada, P. (2017). POL: un nuevo sistema para la detección y clasificación de nombres propios. Procesamiento del Lenguaje Natural, n. 58, pp. 13-20.
Nazar, R.; Soto, R.; Urrejola, K. (2017). Detección automática de nombres eventivos no deverbales en castellano: un enfoque cuantitativo basado en corpus. Revista Linguamatica, vol. 9, num. 2, pp. 21-31.
Nazar, R. (2016). Distributional analysis applied to terminology extraction: example in the domain of psychiatry in Spanish. Terminology: International Journal of Theoretical and Applied Issues in Specialized Communication, 22(2):142-170.
Nazar, R., Renau, I. (2016). A Quantitative analysis of the semantics of verb-argument structures. In S. Torner and E. Bernal (eds.), Collocations and other lexical combinations in Spanish. Theoretical and Applied approaches. New York: Routledge, pp. 114-136.
Nazar, R.; Renau, I. (2016). Automatic extraction of lexico-semantic patterns from corpora. Proceedings of the XVII EURALEX International Congress: Lexicography and Linguistic Diversity. Tinatin Margalitadze and George Meladze (eds). Tbilisi, Gergia: Ivane Javakhishvili Tbilisi State University, pp. 823-830.
Nazar, R.; Renau, I. (2016). A taxonomy of Spanish nouns, a statistical algorithm to generate it and its implementation in open source code. Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC'16). European Language Resources Association (ELRA), May 2016, pp. 1485-1492.
Nazar, R.; Renau, I. (2015). Agrupación semántica de sustantivos basada en similitud distribucional: implicaciones lexicográficas. In María Pilar Garcés Gómez (ed.): Lingüística y diccionarios (Anexos Revista de Lexicografía, vol. 2: 272-295). Universidade da Coruña.
Nazar, R.; Renau, I. (2015). Ontology Population Using Corpus Statistics. Proceedings of the Joint Ontology Workshops 2015 co-located with the 24th International Joint Conference on Artificial Intelligence (IJCAI 2015). Buenos Aires, Argentina, July 25-27, 2015.