Text analysis from the point of view of text grammars
ANALYSIS OF DISCOURSIVE MARKERS
This function is used to automatically find, classify and label discourse markers in a text, following the taxonomy of markers developed by Martín Zorraquino and Portolés (1999). The labels are differentiated by colour, visual marks such as underlining, bolding and their respective categorisation in brackets on the right-hand side of the marker found.
Discourse markers
In order to maintain cohesion and coherence within a text, discourse markers are required, among other elements (Calsamiglia & Tusón, 1999; Fraser, 1999). The definition and classification of these markers tends to be diverse, as their terminology and parameters tend to change constantly depending on the approach and context in which they are used. According to Martín Zorraquino and Portolés (1999), discourse markers are units that can have discursive uses, emphatic uses, expressive values, among others. They would include types of words such as prepositions, adverbs, conjunctions, and other grammatical elements that can perform these functions, even if they are not their usual ones.
Discourse markers are invariable linguistic units, grammaticalized in the language and used in both conversational and written contexts. They are elements that do not seek to perform a syntactic function in the predication of the sentence, and whose purpose is to guide the reader through the text, maintaining coherence between ideas.
Classification
As mentioned above, for the creation of this tool, it was decided to use the classifications described by Martín Zorraquino and Portolés (1999), who divide discourse markers into five main types. Since the tool focuses on written text, conversational markers are not included.
a) Connectors: these are defined as discourse markers that seek to link two segments within the discourse semantically and pragmatically, guiding the reader to understand this relationship. This classification has three subclassifications: additives, which link two elements in the same direction; consecutives, which explain the cause-and-effect relationships between two segments; and counterarguments, markers that seek to eliminate possible conclusions regarding the previous sequence.
b) Information structurers: these indicate the discursive organisation of the text, so they have no argumentative meaning and only focus on the structure of the writing. These are also divided into three subcategories: commentators, which seek to introduce information based on a comment regarding what has been said previously; organisers, which seek to highlight the order of the elements of the text; and digressors, markers that introduce a comment, but in a lateral or separate way.
c) Argumentative operators: these are markers that condition the possibilities of argumentation in the segment in which they are imposed in relation to the previous one. Unlike the others, these only have two subclassifications: argumentative reinforcement, which strengthens the argument mentioned in contrast to other possibilities, and concretion, which seeks to show an example to support what has been said.
d) Reformulators: these seek to reformulate the previous statement through the following one, that is, they seek to say something but in other words. These are divided into four subclassifications: explanatory, which introduce an explanation of the previous statement; rectifying, which correct or define the element referred to more appropriately; distancing, which seek to distance themselves from the commitment made in the text; and recapitulative, which conclude or summarise what has been expressed.
References
===================================
This is another product of Group Tecling.com
There will soon be a paper describing the new version semantic tagger (the one we have not yet installed here but will soon):
• Nazar, R.; Renau, I. (In press). Wikipedia used as a semantic tagger: some preliminary results in Spanish.
Procesamiento del Lenguaje Natural, n. 76.
These other papers describe the rest of the text analyses performed by the software:
• Nazar, R. (2024). Statistical modeling of discourse genres: the case of the opinion column in Spanish. SN Computer Science 5(959):1-11.
• Nazar, R.; Renau, I.; Robledo, H. (2024). Dismark and Text·a·Gram: Automatic identification and categorization of discourse markers in texts. In: Cecilia-Mihaela Popescu & Oana-Adriana Dut,ă (eds.), Discourse Markers in Romance Languages. Crosslinguistic Approaches in Romance and Beyond. Berlin: Peter Lang.
Concept and development: Rogelio Nazar
Collaborators: Javier Obreque, Diego Sánchez, Hernán Robledo, Paolo Caballería, Nicolás Acosta, Scarlette Gatica, Andrea Alcaíno, Ignacio Lobos and Irene Renau.
Documentation: Andrea Alcaíno & Rogelio Nazar