Tecling logo » The universe is not perfect, but it's working on it.      ABOUT RESEARCH SOLUTIONS SOFTWARE CONTACT
Technologies for Linguistic Analysis

18 de setiembre, 2023
Tenemos nuevo artículo sobre metáfora orientacional y polaridad


Hoy se publicó un nuevo artículo:

López Hidalgo, Benjamín; Renau, Irene & Nazar, Rogelio (2023). Correlación entre la metáfora orientacional bueno es arriba / malo es abajo y polaridad positiva/negativa en verbos del español: un estudio con estadística de corpus. Humanidades Digitales, Corpus y Tecnología del Lenguaje. University of Groningen Press, pp. 307-323.

Figura: Porcentaje de diferencia entre total de polaridad + o – en cada verbo analizado (prueba 1, 2 y grupo de verbos en estudio).


7 September 2023
We are presenting a paper at the ConTeNTS Workshop


Rogelio Nazar and Nicolás Acosta will be presenting a paper at the first Workshop on Computational Terminology in NLP and Translation Studies (ConTeNTs), and held in Varna, Bulgaria, co-located with the RANLP Conference (Recent Advances in Natural Language Processing).

The title of the paper is 'Termout: a tool for the semi-automatic creation of term databases'. The presentation is scheduled for September 7th, 2023, at 10:05-10:25 h (GMT +03:00).

UPDATE: 11 September 2023
The Proceedings of the event are already available and we have a paper there:
https://contents2023.kulak.kuleuven.be/proceedings


20 de agosto 2023
Estaremos dando un curso en las Jeling (Mendoza, Argentina)


Ya falta poco para el comienzo de las III Jornadas Nacionales y I Internacionales de Estudios Lingüísticos (JELing): “Diálogos y dinámicas de una lengua en movimiento”. El evento tendrá lugar los días 30 y 31 de agosto y 01 de septiembre de 2023, de 09:00 a 13:00 y de 15:00 a 19:00, en la Facultad de Filosofía y Letras de la Universidad Nacional de Cuyo (Mendoza, Argentina), y es organizado por el Instituto de Lingüística Joan Corominas y la Secretaría de Extensión Universitaria de la Facultad.

Irene Renau, Rogelio Nazar y Nicolás Acosta estarán presentando papers y, además, los tres juntos dictarán un curso de tres sesiones con el título 'Procesamiento de corpus para lexicografía y terminología'. Además de a lexicógrafos y terminólogos, el curso puede interesar a traductores, intérpretes y quienes trabajen con el léxico. No se requiere conocimientos previos más que el uso normal de una computadora. Es teórico-práctico pero con énfasis en lo práctico.


July 17, 2023
Ente: a new semantic tagger for Spanish


In the context of the project Fondecyt Regular (2023-2027) directed by Irene Renau, we started doing a very preliminary exploration of the topic of semantic tagging in Spanish, and surprisingly, there isn't much in the market.
At the moment, our first attempt is the script Ente, already available in this URL:
http://www.tecling.com/ente
One can copy and paste there a text in Spanish and the system will try to guess whether something is the name of a person or a place. There is still a lot to be done, but this is a first step.


3 de julio 2023
Tesis de Javier Obreque gana concurso ALED-Chile


Hoy se anunció el resultado del concurso de tesis a nivel nacional (por Chile) de ALED (Asociación Latinoamericana de Estudios del Discuso). En el texto del anuncio se lee: "Con respecto a los resultados, por unanimidad, y con un puntaje perfecto, resultó ganadora la tesis de magíster de Javier Obreque, dirigida por el Dr. Rogelio Nazar (PUCV). En nombre de la comisión evaluadora y en el propio, felicitamos a Javier y a Rogelio no solo por haber resultado ganadores, sino también por el excelente trabajo realizado. Esperamos entregarles un reconocimiento en el encuentro nacional que se realizará en agosto."
Esto significa que el trabajo de Javier es el elegido para representar a Chile en el concurso ALED Internacional, cuyos resultados se conocerán en algunos días más.-
El título de la tesis es "UNA PROPUESTA METODOLÓGICA PARA LA DETECCIÓN DE OPERADORES MODALES EN LENGUA CASTELLANA", y está en un cruce entre el análisis del discurso y la lingüística computacional.


June 23, 2023
We present a new (massive) text-mining project


We embarked on a new adventure doing a very large information extraction project from a corpus of Education Sciences and the results are very interesting. We worked together with the editorial team of Revista Perspectiva Educacional, an Education journal published by the School of Pedagogy of the Pontifical Catholic University of Valparaíso. This is a project about terminology extraction and term database generation with Termout.org, as well as other tools specifically developed for this endeavor.

The project is already accessible in the following URL:
http://www.tecling.com/perseduc

There are still a few details to finish polishing, but today is the official opening.
The project is also well documented, although for now the interface is only in Spanish.

Tools & demos

We have implemented different types of applications and most of them can be tested online. Take a look.

+ Compare: a simple script to compare two lists of words

+ Cryptoman: a script to generate cryptograms

+ Dismark: a multilingual taxonomy of discourse markers

+ Dsele: a model dictionary for ELE learners

+ Estilector: computer assisted writing for Spanish

+ GeNom: a program to detect the gender of proper nouns

+ HAT: a project for the treatment of polysemy in lexical taxonomies

+ Jaguar: a tool for statistic corpus analysis

+ Kind: a lexical taxonomy induction algorithm

+ Kwico: a concordancer for big corpora

+ Lealem: a reading pacer for parallel German-Spanish texts

+ Leafran: a reading pacer for parallel French-Spanish texts

+ Linguini: a language detector

+ Neven: a program to detect eventive nouns

+ POL: named entity recognition and classification

+ Poppins: a supervised text classifier

+ Porcus: an interface for various taggers and parsers for Spanish

+ pullPOS: a project for the detection of plurals in Spanish

+ Punkt: punktuation of discourse markers in Spanish (new!)

+ Randall: a list randomizer

+ Readeutsch: a reading pacer for parallel German-English texts

+ Sapo: a program to detect similarities between documents

+ Sicam: a program to analyze Spanish poetry

+ Termout: a terminology extraction system (new version!)

+ TEXT·A·GRAM: a program to analyze Spanish texts

+ Verbario: corpus pattern analysis in Spanish

Sausalito

This is the view from where we are located, in the Sausalito lagoon, a quiet and lovely place in Viña del Mar, Chile. Sunny days. Birds can be seen in the center of the lagoon (click to enlarge).

As researchers, we are currently affiliated to:
Pontificia Universidad Católica de Valparaíso
Instituto de Literatura y Ciencias del Lenguaje

Av. El Bosque 1290, Viña del Mar, Chile

Upcoming Events
[UPDATED: 25 September 2023]

27-30 September 2023: Irene Renau and Rogelio Nazar will be presenting two papers at CILH 2023 (Congreso Internacional de Lingüística Hispánica), at Universität Leipzig (Germany).

6 October 2023: Rogelio Nazar will be live on YouTube presenting a paper about Termout.org. The name of the event is III Jornadas de Lingüística y Gramática Española, hosted by Facultad de Filosofía, Historia, Letras y Estudios Orientales, Universidad del Salvador, Buenos Aires, Argentina.

Latest ideas & research projects

We are developing new projects in computational linguistics and natural language processing:

+ Fondecyt Regular (2023-2027): "Mapa de las metáforas conceptuales en sustantivos y verbos del español: un estudio de los patrones metafóricos basado en corpus". Lead researcher: Irene Renau. Co-researcher: Rogelio Nazar.

+ Fondecyt Regular (2019-2021): "Polisemia regular de los sustantivos del español: análisis semiautomático de corpus, caracterización y tipología" (Regular polysemy of nouns in Spanish: semiautomatic analysis of corpus, characterization and tipology). Lead researcher: Irene Renau. Ref.: 1191204.

+ Fondecyt Regular (2019-2021): "Inducción automática de taxonomías de marcadores discursivos a partir de corpus multilingües" (Automatic induction of taxonomies of discourse markers from multilingual corpora). Lead researcher: Rogelio Nazar. Ref.: 1191481.

+ Ecos-Sud (International Project between Chile and France): "Inducción automática de taxonomías del español y el francés mediante técnicas cuantitativas y estadística de corpus". Lead researcher: Irene Renau. Ref.: C16H02.

+ Fondecyt Regular: "Desarrollo de la competencia terminológica a lo largo de la inserción disciplinar". Lead Researcher: Sabela Fernández. Co-researcher: Rogelio Nazar. Ref.: 11121597.

+ See more.

Recent publications

+ Robledo, H.; Nazar, R. (2023). A proposal for the inductive categorisation of parenthetical discourse markers in Spanish using parallel corpora. International Journal of Corpus Linguistics. http://doi.org/10.1075/ijcl.20017.rob

+ Renau, I.; Nazar, R. (2022). Towards a multilingual dictionary of discourse markers: automatic extraction of units from parallel corpus. In: Klosa-Kückelhaus, A.; Engelberg, S.; Möhrs, C.; Storjohann, P. Dictionaries and Society. Proceedings of the XX EURALEX International Congress, Mannheim: IDS-Verlag, pp. 262-272. PDF

+ Nazar, R; Lindemann, D. (2022). Terminology extraction using co-occurrence patterns as predictors of semantic relevance. Proceedings of the TERM21 Workshop. Language Resources and Evaluation Conference (LREC 2022), Marseille, 20-25 June 2022, pp. 26-29. PDF

Solutions for text processing

It is critical for organizations to have the ability to process information automatically, and very often that information is contained in documents to be read by humans rather than machines. We have different methods for text processing depending on the goal.

We can be helpful teaching people how to automatize their text processing routines. We can batch-process thousands of documents to extract information from them or to derive different types of statistics. We can also change these document, or generate databases or email correspondence based on information extracted from them. Anything that involves intelligent management of information can benefit from different degrees of automatization, and by doing that we can free time, effort and resources.

Tell us which are your needs and we will show you what we can do about it.