TEXT·A·GRAM

Text analysis from the point of view of text grammars


Announcement (March 7, 2026): We developed a new semantic tagger (called ``Wicacho'') that will replace the one we have now in place here. This will happen in the next few days. We will be updating...

(back)

ANALYSIS OF DEIXIS

This function is at the moment only available in Spanish.

The tagging of deixis allows you to find, classify and tag deictics (textual markers through which deixis acts) in a given text. These tags are differentiated by colour, visual markers such as underlining and bolding, and their respective categorisation in brackets to the right of the marked deictic.

Deixis

Deictics (or shifters) are elements of the text that refer to the circumstances of the utterance, i.e. the communicative context: the ‘here and now’ of the message, as well as the ‘I and you’ of the sender and the receiver (Lozano, Peña-Marín and Abril, 1989; Renkema, 1999; Cuenca, 2010). Based on these, deictics are classified as spatial, temporal and personal.

Spatial deictics are elements that demonstrate the space of the utterance. They are words whose meaning depends on the place where the text is written. Temporal deictics are markers of temporality that refer to the time when the discourse was uttered. Finally, personal deictics are first- and second-person markers, as they only highlight the people related to the sender and the receiver, and not the third parties referred to.

Deictics can only acquire full meaning when the when, where, and by whom of the particular occurrence to which they refer is known. In other words, deictics are composed of ‘empty words,’ so both personal, spatial, and temporal deictics rely on resources such as personal and demonstrative pronouns to refer to the specific situation.

Some examples of person or actant deixis are the markers “I” (the speaker) and “you” (the receiver). Examples of place deixis include “these”, “that”, “here” and “there”, and examples of temporal deixis include adverbs of time such as “today”, “now”, etc.

References
Calsamiglia, H. & Tusón, A. (1999). Las cosas del decir. Barcelona: Editorial Ariel.
Cuenca, M. J. (2010). Gramática del texto. Madrid: Arco libros.
Lozano, J., Peña-Marín, C. y Abril, G. (1989). Análisis del discurso: Hacia una semiótica de la interacción textual. Madrid: Cátedra.
Renkema, J. (1999). Introducción a los estudios sobre el discurso. Barcelona: Gedisa.

(back)

===================================

This is another product of Group Tecling.com
There will soon be a paper describing the new version semantic tagger (the one we have not yet installed here but will soon):
• Nazar, R.; Renau, I. (In press). Wikipedia used as a semantic tagger: some preliminary results in Spanish. Procesamiento del Lenguaje Natural, n. 76.

These other papers describe the rest of the text analyses performed by the software:
• Nazar, R. (2024). Statistical modeling of discourse genres: the case of the opinion column in Spanish. SN Computer Science 5(959):1-11.
• Nazar, R.; Renau, I.; Robledo, H. (2024). Dismark and Text·a·Gram: Automatic identification and categorization of discourse markers in texts. In: Cecilia-Mihaela Popescu & Oana-Adriana Dut,ă (eds.), Discourse Markers in Romance Languages. Crosslinguistic Approaches in Romance and Beyond. Berlin: Peter Lang.

This is open source-software


This is the code of the module of the semantic tagging Tatatag:
https://tecling.com/textagram/tatatag-source.zip

Please be aware that this version is already deprecated (March 7, 2026) and will soon be replaced by the new one (Wicacho).

Evaluation dataset


This is the evaluation of the old version (Tatatag). It will also be replaced soon. The data used for evaluation consist of Wikipedia pages, 179 in Spanish and 241 in English.
https://tecling.com/textagram/evaluationData.zip

The data here is presented in html format. It includes pages about 19th Century British politicians and Argentine generals of the same period. Files are numbered and each one is associated with an 'evaluation' file, where results are evaluated. At first, we did the evaluation ourselves, but then we opted to use Gemini API to evaluate results. The evaluation is not pefect but is pretty close to human. We are in the process of evaluating the evaluation. More on this soon.
You can also browse the evaluation data here.

The modules for the analysis of discourse markers, deixis and modalization are here:
https://tecling.com/textagram/text·a·gram.zip

This code allows for the local execution of the program, which allows to analyze many documents at the same time. The code is composed of a Perl script and some csv tables. It uses R for the graphs (the online version, instead, uses the GD Graph library).

Concept and development: Rogelio Nazar

Collaborators: Javier Obreque, Diego Sánchez, Hernán Robledo, Paolo Caballería, Nicolás Acosta, Scarlette Gatica, Andrea Alcaíno, Ignacio Lobos and Irene Renau.

Documentation: Andrea Alcaíno & Rogelio Nazar