TEXT·A·GRAM

Text analysis from the point of view of text grammars

Version: August 22, 2025
UPDATE: There is a new module for referent analysis (our semantic tagger tatatag), avilable in English and Spanish. Source code, evaluation data and some documentation available at the end of this page.


The referent analysis (semantic tagging) can also be made in English.
The analysis of discourse markers can also be made in English, French, Catalan and German, but developments in these languages is still very limited.


Select the type of analysis:
Documentation:
Tagging 1: referents
Tagging 2: discourse markers
Tagging 3: deixis
Tagging 4: modalization


(back)

===================================

This is another product of Group Tecling.com
Hopefully, there will soon be a paper describing the semantic tagger. In any case, some more documentation about will soon appear here.
There is a relatively recent paper describing the rest of the text analyses performed by the software:
Nazar, R. (2024). Statistical modeling of discourse genres: the case of the opinion column in Spanish. SN Computer Science 5(959):1-11.

This is open source-software


New! (August 22, 2025)
The code of the module for semantic tagging (Tatatag) is available here:
https://tecling.com/textagram/tatatag-source.zip

Please be aware that this (in general) is work in progress, so it may be subject to change at any moment.

Evaluation dataset

The data used for evaluation consist of Wikipedia pages, 179 in Spanish and 241 in English.
https://tecling.com/textagram/evaluationData.zip

The data here is presented in html format. It includes pages about 19th Century British politicians and Argentine generals of the same period. Files are numbered and each one is associated with an 'evaluation' file, where results are evaluated. At first, we did the evaluation ourselves, but then we opted to use Gemini API to evaluate results. The evaluation is not pefect but is pretty close to human. We are in the process of evaluating the evaluation. More on this soon.
You can also browse the evaluation data here.

The modules for the analysis of discourse markers, deixis and modalization are here:
https://tecling.com/textagram/text·a·gram.zip

This code allows for the local execution of the program, which allows to analyze many documents at the same time. The code is composed of a Perl script and some csv tables. It uses R for the graphs (the online version, instead, uses the GD Graph library).

Concept and development: Rogelio Nazar

Collaborators: Javier Obreque, Diego Sánchez, Hernán Robledo, Paolo Caballería, Nicolás Acosta, Scarlette Gatica, Andrea Alcaíno, Ignacio Lobos and Irene Renau.

Documentation: Andrea Alcaíno & Rogelio Nazar