Tecling logo » The universe is not perfect, but it's working on it.      ABOUT RESEARCH SOLUTIONS SOFTWARE CONTACT
Technologies for Linguistic Analysis

September 28, 2022
We have new paper on Spanish orthography

How lovely is the smell of a just-printed paper in the morning! Today we woke up with the news that the following article is already published (in Spanish): Renau, I., Nazar, R. y Díaz, L. (2022). La Ortografía de la lengua española (2010) y su impacto en la prensa de cinco países hispanohablantes. Normas, 12, 91-109, doi: 10.7203/Normas.v12i1.25102

Here the abstract:

Considering the fact that mass media are propitious spaces for the standardization of the linguistic norm, this study analyzes the degree of acceptance of some norms that were proposed in the Ortografía de la lengua española (RAE and ASALE 2010): a) the accent of guión/guion and other similar words, b) sólo/solo and the demonstrative pronouns and c) some cases of loans and Latinisms (such as whisky/wiski). Two newspapers from five Spanish-speaking countries (Argentina, Chile, Colombia, Spain and Mexico) were analyzed. GoogleApi was used to collect 179,238 occurrence contexts in total, divided by years and journals, from 2010 to 2019. The results indicate that, at the end of the period studied, there are still, globally, many cases of forms with the old spelling in the three groups.

July 15, 2022
Six of our students presented their theses

Extremely talented young people working with us... This Friday, six of our students presented their theses, after a year's hard work. Pedro Bolbarán, Camila Pérez, Bahony Saavedra, Gabriela Cacciuttolo, Héctor Ramos and Javiera Silva are seen smiling at the camera after the defense, surrounding a very proud adviser. They were supposed to write undergraduate theses, but their work looks more like PhD theses! The manuscripts will soon be available online at the library of PUCV.cl

July 13, 2022
We presented new paper at Euralex 2022

Irene Renau presented the paper ``Towards a multilingual dictionary of discourse markers: automatic extraction of units from parallel corpus'', at the EURALEX 2022 Congress, held in Mannheim, Germany. The talk described our project Dismark, the multilingual database of discourse markers, which is now in the process of becoming a dictionary.
The paper is available here.

June 20, 2022
We just presented a paper at Terminology in the 21st century

Today, at 16:45 Central European Time (that is, 10:45 Chilean time) Rogelio Nazar and David Lindemann presented the talk Terminology extraction using co-occurrence patterns as predictors of semantic relevance in the workshop on Terminology in the 21st century: many faces, many places (Term 21), co-located with LREC 2022 in Marseille, France.
The paper is available here.
The program of the workshop is available here, and the full Proceedings are available as well.

June 16, 2022
We delivered an online presentation at DISROM 7

Today, Irene Renau, Rogelio Nazar and Hernán Robledo presented the talk Automatic extraction of discourse markers from parallel corpus at the Discourse Markers in Romance Languages Conference (DISROM 7). The conference is taking place these days (16-18 June 2022) in Craiova, Romania, but it's also broadcast online. There are many other interesting presentations that you can follow:

June 6, 2022
We have a new toy: Clusterre

This script is intended as a friendly interface to R's clustering function. It will create dendrograms from several (up to nine) lists of items, as well as the data matrix if you want to use it for something else. There must be only one item per line. Any other information will be ignored, as these are treated as binary values only. The result is the matrix and the dendrogram.
Happy clustering!
What's new? You can now name the objects using the first element in each list as header. Just remember to click on the checkbox if those are your intentions.

May 7, 2022
MANDINGA is back!

Mandinga, our dear old word sense induction algorithm, is now back online, after many years forgotten. Given an input word, it tells if said unit is polysemous and, if so, it produces a list of the possible senses. Of course, it does not use any lexicographic resource. It does all using only corpora and graph-based co-occurrence algorithms:
Update (May 9, 2022): At this moment the system is available in Spanish, English and French.

May 6, 2022
NEOPTER: identification of neologisms in a list of Spanish words

This is what happens when we have a lot of paperwork to do: we divert efforts to the creation of new demos. Now we have Neopter, a little script that takes a list of Spanish words and identifies those that had not been used prior to 2012. To enjoy (with moderation):

April 30, 2022
GEOMOT: a script to find the distribution of Spanish words per country

We have a new product on display today. Geomot is a script that will accept one or more Spanish words and will tell in which countries they are most frequently used. We find it useful for different types of Spanish lexicographic projects. Next step will be to do the same in Arabic, as this language shares with Spanish the phenomenon of wide geographical lexical variation. http://www.tecling.com/geomot

April 23, 2022
MORFOL: a new Spanish morphological analyzer

Morfol is a brand new morphological analyzer for Spanish that we are using for the categorization of terms and neologisms. It accepts a list of terms/words/neologisms (one per line) and then proceeds to classify them. It will produce the grammatical category, grammatical genre and also, by trying to identify prefixes and suffixes, the internal morphological structure. Give it a try and tell us what you think about it: http://www.tecling.com/morfol

Tools & demos

We have implemented different types of applications and most of them can be tested online. Take a look.

+ Compare: a simple script to compare two lists of words

+ Cryptoman: a script to generate cryptograms

+ Dismark: a multilingual taxonomy of discourse markers (new!)

+ Dsele: a model dictionary for ELE learners

+ Estilector: computer assisted writing for Spanish

+ GeNom: a program to detect the gender of proper nouns

+ HAT: a project for the treatment of polysemy in lexical taxonomies

+ Jaguar: a tool for statistic corpus analysis

+ Kind: a lexical taxonomy induction algorithm

+ Kwico: a concordancer for big corpora

+ Lealem: a reading pacer for parallel German-Spanish texts

+ Leafran: a reading pacer for parallel French-Spanish texts

+ Linguini: a language detector

+ Neven: a program to detect eventive nouns

+ POL: named entity recognition and classification

+ Poppins: a supervised text classifier

+ Porcus: an interface for various taggers and parsers for Spanish

+ pullPOS: a project for the detection of plurals in Spanish

+ Readeutsch: a reading pacer for parallel German-English texts

+ Sapo: a program to detect similarities between documents

+ Sicam: a program to separate a Spanish Word in syllables

+ Termout: a terminology extraction system

+ Termoutling: (new) an automatic linguistics glossary

+ TEXT·A·GRAM: a program to analyze Spanish texts

+ Verbario: corpus pattern analysis in Spanish


This is the view from where we are located, in the Sausalito lagoon, a quiet and lovely place in Viña del Mar, Chile. Sunny days. Birds can be seen in the center of the lagoon (click to enlarge).

As researchers, we are currently affiliated to:
Pontificia Universidad Católica de Valparaíso
Instituto de Literatura y Ciencias del Lenguaje

Av. El Bosque 1290, Viña del Mar, Chile

Upcoming Events

31 October - 4 November, 2022 (THE EVENT HAS BEEN POSTPONED BY THE ORGANIZERS: The original dates were 31-19 August, 2022): Irene Renau and Rogelio Nazar will be teaching a posgraduate course with the title "Lexicografía Basada en Corpus" (Corpus-based lexicography) in Universidad Nacional de Cuyo (Mendoza, Argentina).

Latest ideas & research projects

We are developing new projects in computational linguistics and natural language processing:

+ Fondecyt Regular (2019-2021): "Polisemia regular de los sustantivos del español: análisis semiautomático de corpus, caracterización y tipología" (Regular polysemy of nouns in Spanish: semiautomatic analysis of corpus, characterization and tipology). Lead researcher: Irene Renau. Ref.: 1191204.

+ Fondecyt Regular (2019-2021): "Inducción automática de taxonomías de marcadores discursivos a partir de corpus multilingües" (Automatic induction of taxonomies of discourse markers from multilingual corpora). Lead researcher: Rogelio Nazar. Ref.: 1191481.

+ Ecos-Sud (International Project between Chile and France): "Inducción automática de taxonomías del español y el francés mediante técnicas cuantitativas y estadística de corpus". Lead researcher: Irene Renau. Ref.: C16H02.

+ Fondecyt Regular: "Desarrollo de la competencia terminológica a lo largo de la inserción disciplinar". Lead Researcher: Sabela Fernández. Co-researcher: Rogelio Nazar. Ref.: 11121597.

+ See more.

Recent publications

+ Renau, I.; Nazar, R. (2022). Towards a multilingual dictionary of discourse markers: automatic extraction of units from parallel corpus. In: Klosa-Kückelhaus, A.; Engelberg, S.; Möhrs, C.; Storjohann, P. Dictionaries and Society. Proceedings of the XX EURALEX International Congress, Mannheim: IDS-Verlag, pp. 262-272. PDF

+ Nazar, R; Lindemann, D. (2022). Terminology extraction using co-occurrence patterns as predictors of semantic relevance. Proceedings of the TERM21 Workshop. Language Resources and Evaluation Conference (LREC 2022), Marseille, 20-25 June 2022, pp. 26-29. PDF

+ Nazar, R. (2021). "Inducción automática de una taxonomía multilingüe de marcadores discursivos: primeros resultados en castellano, inglés, francés, alemán y catalán". Procesamiento del Lenguaje Natural, núm 67, pp. 127-138. PDF

+ Nazar, R. (2021). "Automatic induction of a multilingual taxonomy of discourse markers". Iztok Kosem et al. (eds.) Electronic lexicography in the 21st century: post-editing lexicography. Lexical Computing CZ s.r.o., Brno, pages 440-454. PDF

+ Castro, A.; Nazar, R.; Renau, I. (2021). "New verbs and dictionaries: a method for the automatic detection of neology in Spanish verbs". International Journal of Lexicography, ...

+ Nazar, R.; Renau, I., Acosta, N., Robledo, H., Soliman, H., Zamora, S. (2021). "Corpus-Based Methods for Recognizing the Gender of Anthroponyms". Names: A Journal of Onomastics, vol. 69 num. 3. PDF

+ See more.

Solutions for text processing

It is critical for organizations to have the ability to process information automatically, and very often that information is contained in documents to be read by humans rather than machines. We have different methods for text processing depending on the goal.

We can be helpful teaching people how to automatize their text processing routines. We can batch-process thousands of documents to extract information from them or to derive different types of statistics. We can also change these document, or generate databases or email correspondence based on information extracted from them. Anything that involves intelligent management of information can benefit from different degrees of automatization, and by doing that we can free time, effort and resources.

Tell us which are your needs and we will show you what we can do about it.