TEXT·A·GRAM

Text analysis from the point of view of text grammars


Announcement (April 3, 2026): We have just inaugurated ``Wicacho'', a new semantic tagger based on Wikipedia, that will replace the one we have now in place here. For now, the new tagger has its own website ( http://www.tecling.com/wicacho ) but we will put it here as well very soon.

The referent analysis (semantic tagging) can also be made in English.
The analysis of discourse markers can also be made in English, French, Catalan and German, but developments in these languages is still very limited.


Select the type of analysis:
Documentation:
Tagging 1: referents
Tagging 2: discourse markers
Tagging 3: deixis
Tagging 4: modalization


(back)

===================================

This is another product of Group Tecling.com
There will soon be a paper describing the new version semantic tagger (the one we have not yet installed here but will soon):
• Nazar, R.; Renau, I. (In press). Wikipedia used as a semantic tagger: some preliminary results in Spanish. Procesamiento del Lenguaje Natural, n. 76.

These other papers describe the rest of the text analyses performed by the software:
• Nazar, R. (2024). Statistical modeling of discourse genres: the case of the opinion column in Spanish. SN Computer Science 5(959):1-11.
• Nazar, R.; Renau, I.; Robledo, H. (2024). Dismark and Text·a·Gram: Automatic identification and categorization of discourse markers in texts. In: Cecilia-Mihaela Popescu & Oana-Adriana Dut,ă (eds.), Discourse Markers in Romance Languages. Crosslinguistic Approaches in Romance and Beyond. Berlin: Peter Lang.

This is open source-software


This is the code of the module of the semantic tagging Tatatag:
https://tecling.com/textagram/tatatag-source.zip

Please be aware that this version is already deprecated (March 7, 2026) and will soon be replaced by the new one (Wicacho).

Evaluation dataset


This is the evaluation of the old version (Tatatag). It will also be replaced soon. The data used for evaluation consist of Wikipedia pages, 179 in Spanish and 241 in English.
https://tecling.com/textagram/evaluationData.zip

The data here is presented in html format. It includes pages about 19th Century British politicians and Argentine generals of the same period. Files are numbered and each one is associated with an 'evaluation' file, where results are evaluated. At first, we did the evaluation ourselves, but then we opted to use Gemini API to evaluate results. The evaluation is not pefect but is pretty close to human. We are in the process of evaluating the evaluation. More on this soon.
You can also browse the evaluation data here.

The modules for the analysis of discourse markers, deixis and modalization are here:
https://tecling.com/textagram/text·a·gram.zip

This code allows for the local execution of the program, which allows to analyze many documents at the same time. The code is composed of a Perl script and some csv tables. It uses R for the graphs (the online version, instead, uses the GD Graph library).

Concept and development: Rogelio Nazar

Collaborators: Javier Obreque, Diego Sánchez, Hernán Robledo, Paolo Caballería, Nicolás Acosta, Scarlette Gatica, Andrea Alcaíno, Ignacio Lobos and Irene Renau.

Documentation: Andrea Alcaíno & Rogelio Nazar