Kind - The Taxonomy Project

You are now on the French side of the taxonomy

You can also go to the other sides:

Enter an arbitrary word or a list of words in French to classify (one per line):

Verbose
(this will make it very verbose, so don't tick it if you are
puting many input words, otherwise you will get fluded with text).

Ignore cache
(this will make it not-use the cache, thus it will ignore
any previous classification of the same noun that it may
have done in the past.


You can also try with these precompiled examples.
They have already been tagged by humans, so the evaluation can be made automatically.

Abstract

We designed a statistically-based taxonomy induction algorithm consisting of a combination of different strategies not involving explicit linguistic knowledge. Being all quantitative, the strategies we present are however of different nature. Some of them are based on the computation of distributional similarity coefficients which identify pairs of sibling words or co-hyponyms, while others are based on asymmetric co-occurrence and identify pairs of parent-child words or hypernym-hyponym relations. A decision making process is then applied to combine the results of the previous steps, and finally connect lexical units to a basic structure containing the most general categories of the language. We are currently in the process of evaluating the latest results.

Documentation and source code

Today is Friday, December 11, 2020. The source code and documentation have been changing very rapidly and we are still working on the details. However, you can have a look at what we've got so far, which is a stable version.
We will not maintain older versions.
If you would like to send inquiries you are welcome to do so at rogelio (dot) nazar (at) gmail (dot) com .

Funding


This project has been supported by two successive grants:

  1. Conicyt-Fondecyt 11140686, “Inducción automática de taxonomías de sustantivos generales y especializados a partir de corpus textuales desde el enfoque de la lingüística cuantitativa” (Automatic taxonomy induction from corpora for terminology and general vocabulary using quantitative measures). Lead researcher: Rogelio Nazar. (2014 to 2017).
  2. Ecos Sud-Conicyt Project C16H02 “Inducción automática de taxonomías del español y el francés mediante técnicas cuantitativas y estadística de corpus” (Automatic taxonomy induction from corpora for Spanish and French using quantitative corpus analysis). Lead researcher: Irene Renau. (2016-2019).

Credits

Researchers:

Related publications: