Dismark

Official website of Project Fondecyt 1191481

AUTOMATIC INDUCTION OF TAXONOMIES OF DISCOURSE MARKERS FROM MULTILINGUAL CORPORA

Current version: July 30, 2024
Last week we did some long awaited maintenance service of the hardware hosting this website. Everything went smoothly and we haven't encountered any bugs so far. If you detect something, please say so in an email to rogelio dot nazar at pucv dot cl.


This web site offers the following contents:
  1. Documentation
  2. A multilingual taxonomy resulting from the project
  3. An automatic classifier of discourse markers

3. Automatic classifier of discourse markers

Paste a list of expressions here:

Language:

(It is faster if the language is selected manually)

This demo of the algorithm receives an input list of one or more expression (one per line) and then carries on with the following tasks:
  1. It will classify the units by language (among those languages already listed)
  2. It will decide, in each case, if it is a discourse marker or not
  3. If it is, it will assign a category to it.

At the moment, it will not process more than 50 expressions at once because the algorithm is not yet optimized for massive data processing. At some point in time we will deal with that.