Tecling logo » The universe is not perfect, but it's working on it.      ABOUT RESEARCH SOLUTIONS SOFTWARE CONTACT
Technologies for Linguistic Analysis
Bifid: Parallel corpus alignment at the document, sentence and vocabulary levels
Logo Bifid

Bifid is a program for parallel corpora alignment:

Web demo: http://www.bifidalign.com/

Update July 21, 2024: We are updating the server

We are doing some long awaited maintenance service of the hardware hosting this website. Be patient until we finish, in a few minutes.

Bifid is a program that takes a set of documents with their translations
and performs different functions:
  1. It separates the set of documents in the two languages
  2. It aligns each document with its translation
  3. It aligns the sentences in each pair of documents
  4. It extracts a bilingual vocabulary from the aligned sentences
  5. It export results in csv and tmx formats
  6. It imports tmx documents, in case you already have your corpus
    aligned at the sentence level and what you want is to obtain a bilingual vocabulary.
  7. The bilingual vocabulary includes multi-word expressions.

Give it a try:
Here you have a nice little parallel corpus in English
and Spanish extracted from
Revista Chilena de Neuropsiquiatría.
Download the zip file and upload it again to your account.

You can also upload a tmx file if you have it already,
and in this way by pass the document and sentence alignment.
Here is an example file from Opus corpus:
emea.tmx.zip (warning: this is a large file
and it takes time to process).
Lastly, if you want to try with a different pair of languages, here is
subset of the Canadian Hansards, with English and French.

Bifid had been online since 2004 (yes, it's going to be 20 years now)
but lately its server had gone down and it was neglected.
But here it is now, again, restored to its former glory!
We are planing some kind of celebration for its 20th birthday (none
remembers the actual date so we will celebrate the whole year).
We will be updating on this soon.


Some (old) publications on the project:
Nazar, R. (2011). Parallel corpus alignment at the document, sentence and vocabulary levels.
Procesamiento del Lenguaje Natural, n. 47.

Nazar, R. (2012). Bifid: un alineador de corpus paralelo a nivel de documento, oración y vocabulario.
Linguamatica, vol. 4, no. 2.


If you have questions, feel free to send email: rogelio dot nazar at pucv dot cl
Error while reading file.

References:
Nazar, R. (2011). "Parallel corpus alignment at the document, sentence and vocabulary levels". Procesamiento del Lenguaje Natural, n. 47.
Nazar, R. (2012). "Bifid: un alineador de corpus paralelo a nivel de documento, oración y vocabulario". Linguamatica, vol. 4, no. 2.

Contact: rogelio.nazar at gmail.com
Related concepts: Parallel Corpus Alignment, Bilingual Vocabulary Extraction, Machine Translation, Computational Linguistics, Computational Lexicography