Bifid: Parallel corpus alignment at the document, sentence and vocabulary levels
Bifid is a program for parallel corpora alignment:
State of this project on January 17, 2021:Last year we had to interrupt this service due to securityissues detected in the server and our lack of time to solve them. We had to put the server down until we had the time for a compelete overhaul of that piece of machinery. But in the meantime, we have been planning also to improve Bifid's software making it less computationally expensive and easier to install in other hardware. Up to now, Bifid was too dependent on the Jaguar Project, which has problems of its own. So what we are doing is to integrate parts of Jaguar's code into Bifid and also doing some other major changes, with the inclusion of preloaded information about different languages. This is a significant departure from the original project, explained in these publications: Nazar, R. (2011). Parallel corpus alignment at the document, sentence and vocabulary levels. Procesamiento del Lenguaje Natural, n. 47. Nazar, R. (2012). Bifid: un alineador de corpus paralelo a nivel de documento, oración y vocabulario. Linguamatica, vol. 4, no. 2. One of the interesting features of the original proposal was the aim at total linguistic agnosticism. Ideally, we will try to maintain some functionality for the cases of languages unknown for the system. But from a practical point of view, it could be argued that there is no need for the said agnosticism in the case of well-known languages like English, Spanish, French, German and others. Such knowledge would help Bifid take better decisions and faster. The situation on the ground today is the following: We have considerably improved our ability to detect sentences, and we have a new prototype to do just that: Segusmund We also developed a language detection algorithm that also detects fragments writen in languages other than the main one. We call it Linguini In the coming days (or, probably, weeks!) we will be working in integrating all this in the new version of Bifid. If you have questions, feel free to send email: rogelio dot nazar at pucv dot cl Error while reading file. References: Nazar, R. (2011). "Parallel corpus alignment at the document, sentence and vocabulary levels". Procesamiento del Lenguaje Natural, n. 47. Nazar, R. (2012). "Bifid: un alineador de corpus paralelo a nivel de documento, oración y vocabulario". Linguamatica, vol. 4, no. 2. Contact: rogelio.nazar at gmail.com Related concepts: Parallel Corpus Alignment, Bilingual Vocabulary Extraction, Machine Translation, Computational Linguistics, Computational Lexicography |