Tecling: Technologies for Linguistic Analysis

Bifid: Parallel corpus alignment at the document, sentence and vocabulary levels

Bifid is a program for parallel corpora alignment:

Web demo: http://www.bifidalign.com/

State of this project on January 17, 2021:

Last year we had to interrupt this service due to security
issues detected in the server and our lack of time to solve them. We had to put
the server down until we had the time for a compelete overhaul of that piece of machinery.

But in the meantime, we have been planning also to improve Bifid's software
making it less computationally expensive and easier to install in other hardware.
Up to now, Bifid was too dependent on the Jaguar Project, which has problems of its own.
So what we are doing is to integrate parts of Jaguar's code into Bifid and also doing
some other major changes, with the inclusion of preloaded information about different languages.
This is a significant departure from the original project, explained in these publications:
Nazar, R. (2011). Parallel corpus alignment at the document, sentence and vocabulary levels.
Procesamiento del Lenguaje Natural, n. 47.
Nazar, R. (2012). Bifid: un alineador de corpus paralelo a nivel de documento, oración y vocabulario.
Linguamatica, vol. 4, no. 2.
One of the interesting features of the original proposal was the aim at total
linguistic agnosticism. Ideally, we will try to maintain some functionality for the
cases of languages unknown for the system. But from a practical point of view,
it could be argued that there is no need for the said agnosticism in the case
of well-known languages like English, Spanish, French, German and others.
Such knowledge would help Bifid take better decisions and faster.
The situation on the ground today is the following:
We have considerably improved our ability to detect sentences, and we have
a new prototype to do just that:
Segusmund
We also developed a language detection algorithm that also detects
fragments writen in languages other than the main one. We call it
Linguini
In the coming days (or, probably, weeks!) we will be working in integrating all this in the
new version of Bifid.

If you have questions, feel free to send email: rogelio dot nazar at pucv dot cl
Error while reading file.

References:
Nazar, R. (2011). "Parallel corpus alignment at the document, sentence and vocabulary levels". Procesamiento del Lenguaje Natural, n. 47.
Nazar, R. (2012). "Bifid: un alineador de corpus paralelo a nivel de documento, oración y vocabulario". Linguamatica, vol. 4, no. 2.

Contact: rogelio.nazar at gmail.com
Related concepts: Parallel Corpus Alignment, Bilingual Vocabulary Extraction, Machine Translation, Computational Linguistics, Computational Lexicography


*» The universe is not perfect, but it's working on it.* ABOUT RESEARCH SOLUTIONS SOFTWARE CONTACT		Technologies for Linguistic Analysis