TEXT·A·GRAM

Text analysis from the point of view of text grammars


Announcement (April 3, 2026): We have just inaugurated ``Wicacho'', a new semantic tagger based on Wikipedia, that will replace the one we have now in place here. For now, the new tagger has its own website ( http://www.tecling.com/wicacho ) but we will put it here as well very soon.

(back)

ANALYSIS OF REFERENCE This function of the programme allows you to find, classify and label references in a given text. These labels are differentiated by colour, visual markers such as underlining and bolding, and their respective categorization in brackets to the right of the reference found.

About references

The main characteristic of a genuine text is coherence, and one of the manifestations of this is the repetition of a specific set of names designated as the references of the text, that is, the main objects or themes of the text (De Beaugrande and Dressler, 1997; Calsamiglia and Tusón, 1999; Cuenca, 2010). The senders constantly refers to previously mentioned concepts but also introduces new ones as they progress, a phenomenon known as semantic isotopy (Lozano, Peña-Marín & Abril, 1989).

Classification

As mentioned above, this TEXT·A·GRAM function seeks to label the referents found in a given text. Currently, referents are identified based on their repetition throughout the text.

In the case of proper names, the system classifies them as people, places or organizations. In the case of common names, it attempts to classify them according to the Kind taxonomy project (http://www.tecling.com/kind).

References
Calsamiglia, H. & Tusón, A. (1999). Las cosas del decir. Barcelona: Editorial Ariel.
Cuenca, M. J. (2010). Gramática del texto. Madrid: Arco libros.
De Beaugrande, R. A. & Dressler, W. U. (1997). Introducción a la lingüística del texto. Barcelona: Editorial Ariel.
Lozano, J., Peña-Marín, C. & Abril, G. (1989). Análisis del discurso: Hacia una semiótica de la interacción textual. Madrid: Cátedra.

(back)

===================================

This is another product of Group Tecling.com
There will soon be a paper describing the new version semantic tagger (the one we have not yet installed here but will soon):
• Nazar, R.; Renau, I. (In press). Wikipedia used as a semantic tagger: some preliminary results in Spanish. Procesamiento del Lenguaje Natural, n. 76.

These other papers describe the rest of the text analyses performed by the software:
• Nazar, R. (2024). Statistical modeling of discourse genres: the case of the opinion column in Spanish. SN Computer Science 5(959):1-11.
• Nazar, R.; Renau, I.; Robledo, H. (2024). Dismark and Text·a·Gram: Automatic identification and categorization of discourse markers in texts. In: Cecilia-Mihaela Popescu & Oana-Adriana Dut,ă (eds.), Discourse Markers in Romance Languages. Crosslinguistic Approaches in Romance and Beyond. Berlin: Peter Lang.

This is open source-software


This is the code of the module of the semantic tagging Tatatag:
https://tecling.com/textagram/tatatag-source.zip

Please be aware that this version is already deprecated (March 7, 2026) and will soon be replaced by the new one (Wicacho).

Evaluation dataset


This is the evaluation of the old version (Tatatag). It will also be replaced soon. The data used for evaluation consist of Wikipedia pages, 179 in Spanish and 241 in English.
https://tecling.com/textagram/evaluationData.zip

The data here is presented in html format. It includes pages about 19th Century British politicians and Argentine generals of the same period. Files are numbered and each one is associated with an 'evaluation' file, where results are evaluated. At first, we did the evaluation ourselves, but then we opted to use Gemini API to evaluate results. The evaluation is not pefect but is pretty close to human. We are in the process of evaluating the evaluation. More on this soon.
You can also browse the evaluation data here.

The modules for the analysis of discourse markers, deixis and modalization are here:
https://tecling.com/textagram/text·a·gram.zip

This code allows for the local execution of the program, which allows to analyze many documents at the same time. The code is composed of a Perl script and some csv tables. It uses R for the graphs (the online version, instead, uses the GD Graph library).

Concept and development: Rogelio Nazar

Collaborators: Javier Obreque, Diego Sánchez, Hernán Robledo, Paolo Caballería, Nicolás Acosta, Scarlette Gatica, Andrea Alcaíno, Ignacio Lobos and Irene Renau.

Documentation: Andrea Alcaíno & Rogelio Nazar