TEXT·A·GRAM

Text analysis from the point of view of text grammars


Announcement (April 3, 2026): We have just inaugurated ``Wicacho'', a new semantic tagger based on Wikipedia, that will replace the one we have now in place here. For now, the new tagger has its own website ( http://www.tecling.com/wicacho ) but we will put it here as well very soon.

(back)

ANALYSIS OF MODALIZATION

This function is at the moment only available in Spanish.

The modalization tagger is a tool that allows you to find, classify and tag modalizers, which are defined as textual markers that represent the act of modalization in a given text. These tags are differentiated by colour, visual markers such as underlining and blackening of the word, and their respective categorization in brackets to the right of the modalizer found.

Modalization

Modalization is the attitude or position of the speaker with regard to their own utterance (Kalinowski, 1976; Lozano, Peña-Marín and Abril, 1989; Kerbrat-Orecchioni, 1997; Otaola, 1998; Calsamiglia and Tusón, 1999), that is, a judgement made by the senders about their perception, leaving this opinion materialized in the utterance.

Speakers have the ability to choose various linguistic elements to express the relationship they have with their utterance, with their actions, with their interlocutors and with the object to which they refer; in addition, they will also seek to express their states or attitudes with respect to what is being said at the moment (e.g., their knowledge, doubts, intentions, etc.).

We can divide every statement into the planes of dictum and modus, where the former would be the propositional content and the latter the way in which this proposition is commented on. The speaker uses modalization based on the selection of specific language units, which will reflect their opinion on a real or imaginary referential object. This makes a discourse more “objective”, closer to the elimination of personal traces, or more “subjective”, presenting these marks explicitly or implicitly.

Given that no existing expression is completely objective, since there would be no zero modalization, modalizers are understood as the linguistic elements that the speaker uses to demonstrate their personal subjectivity; therefore, their use will be directly linked to their knowledge, beliefs, desires, demands, and points of view.

Classification

The number of modalizers and their categories is undetermined, as they have not been fully inventoried or classified. The labels used to classify modalizers were chosen from the names given to them in disciplinary studies of the Spanish language. There are six classifications:

  1. epistemic modality: to indicate instances in which the author takes a position on what they know or believe they know
  2. axiological: when evaluations are expressed
  3. alethic: when the possibility or probability of an event is expressed
  4. deontic: when it is indicated that something is mandatory or permitted
  5. veridictive: when the degree of truthfulness is indicated
  6. volitional: when desires or intentions are expressed.

The taxonomy seeks to be as exhaustive as possible, so it indicates both negative and positive forms, with or without the presence of pronouns and prepositions, in addition to the adverbial transformations that some expressions tend to have.

References
Calsamiglia, H. & Tusón, A. (1999). Las cosas del decir. Barcelona: Editorial Ariel.
Kalinowski, G. (1976). Un aperçu élémentaire des modalités déontiques. Langages, 10ᵉ année, n°43. Modalités : logique, linguistique, sémiotique. pp. 10-18.
Kerbrat-Orecchioni, C. (1997). La enunciación de la subjetividad en el lenguaje. Buenos Aires: Edicial.
Lozano, J., Peña-Marín, C. & Abril, G. (1989). Análisis del discurso: Hacia una semiótica de la interacción textual. Madrid: Cátedra.
Otaola, C. (1988). La modalidad (con especial referencia a la lengua española). Revista de Filología Española, 68 (1/2), 97-117.

(back)

===================================

This is another product of Group Tecling.com
There will soon be a paper describing the new version semantic tagger (the one we have not yet installed here but will soon):
• Nazar, R.; Renau, I. (In press). Wikipedia used as a semantic tagger: some preliminary results in Spanish. Procesamiento del Lenguaje Natural, n. 76.

These other papers describe the rest of the text analyses performed by the software:
• Nazar, R. (2024). Statistical modeling of discourse genres: the case of the opinion column in Spanish. SN Computer Science 5(959):1-11.
• Nazar, R.; Renau, I.; Robledo, H. (2024). Dismark and Text·a·Gram: Automatic identification and categorization of discourse markers in texts. In: Cecilia-Mihaela Popescu & Oana-Adriana Dut,ă (eds.), Discourse Markers in Romance Languages. Crosslinguistic Approaches in Romance and Beyond. Berlin: Peter Lang.

This is open source-software


This is the code of the module of the semantic tagging Tatatag:
https://tecling.com/textagram/tatatag-source.zip

Please be aware that this version is already deprecated (March 7, 2026) and will soon be replaced by the new one (Wicacho).

Evaluation dataset


This is the evaluation of the old version (Tatatag). It will also be replaced soon. The data used for evaluation consist of Wikipedia pages, 179 in Spanish and 241 in English.
https://tecling.com/textagram/evaluationData.zip

The data here is presented in html format. It includes pages about 19th Century British politicians and Argentine generals of the same period. Files are numbered and each one is associated with an 'evaluation' file, where results are evaluated. At first, we did the evaluation ourselves, but then we opted to use Gemini API to evaluate results. The evaluation is not pefect but is pretty close to human. We are in the process of evaluating the evaluation. More on this soon.
You can also browse the evaluation data here.

The modules for the analysis of discourse markers, deixis and modalization are here:
https://tecling.com/textagram/text·a·gram.zip

This code allows for the local execution of the program, which allows to analyze many documents at the same time. The code is composed of a Perl script and some csv tables. It uses R for the graphs (the online version, instead, uses the GD Graph library).

Concept and development: Rogelio Nazar

Collaborators: Javier Obreque, Diego Sánchez, Hernán Robledo, Paolo Caballería, Nicolás Acosta, Scarlette Gatica, Andrea Alcaíno, Ignacio Lobos and Irene Renau.

Documentation: Andrea Alcaíno & Rogelio Nazar