Description
PORCUS - Parsing de Oraciones Realizado por Computadora, Unificado y
Serial (Unified and Serialized Computer-based Sentence Parsing)
This software provides routines to execute POS taggers with one command.
The purpose of this is to have in a single place any POS tagger that you
use, in a simple and easy way. Also, PORCUS calculates the time that
every tagger takes in its execution, so you can compare their
perfomance.
For now, we only support English and Spanish languages and we provide
you with six POS taggers to download separately: TreeTagger, UDPipe,
spaCy, IXA-pipes, Stanford CoreNLP and DepPattern. However, you can
install more in your own system in the ./taggers directory. See the
modules' files to get examples on how to create your own.
We have an web demo to show PORCUS' performance: http://www.tecling.com/porcus
In this demo, we add SyntaxNet from Google, LSTM-parser, FreeLing and
Connexor. For more information, run perl /path/to/porcus.pl --help in
the command line.
System Requirements
* UNIX-like OS system (We have tested PORCUS in Ubuntu, Arch Linux and
Ubuntu in Windows Subsystem Linux)
* Perl 6
* Python >= 2.7
* Java 8 (>= 1.8.0)
Usage
Run perl porcus.pl --help in the PORCUS' directory.
License
PORCUS - Parsing de Oraciones Realizado por Computadora, Unificado y
Serial (Unified and Serialized Computer-based Sentence Parsing)
Nicolás Acosta - www.tecling.com/acosta & Rogelio Nazar -
www.tecling.com/nazar
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
You should have received a copy of the GNU General Public License along
with this program. If not, see .
Contact
Thanks for using PORCUS!
Nicolás Acosta - nmac.1996@gmail.com Universidad Nacional de Cuyo,
Mendoza, Argentina
Rogelio Nazar - rogelio.nazar@gmail.com Pontificia Universidad Católica
de Valparaíso
http://www.tecling.com/
References
Connexor
Tapanainen, P.; Järvinen, T. (1997). A non-projective dependency parser. In Proceedings of the fifth conference on Applied natural language processing (ANLC '97). Association for Computational Linguistics, Stroudsburg, PA, USA, 64-71.
DepPattern
Gamallo, P.; González I. (2011) A Grammatical Formalism Based on Patterns of Part-of-Speech Tags, International Journal of Corpus Linguistics, 16(1), 45-71. ISNN:1384-6655
Gamallo, P. (2015). Dependency Parsing with Compression Rules, The 14th International Conference on Parsing Technologies (IWPT-2015) p. 107-117, Bilbao. ISBN 978-1-941643-98-3
Gamallo, P.; González, I. (2012). DepPattern: A Multilingual Dependency Parser, Demo Session of the International Conference on Computational Processing of the Portuguese Language (PROPOR 2012), April 17-20, Coimbra, Portugal.
FreeLing
Atserias, J.; Casas, B.; Comelles, E.; González, M.; Padró, L.; Padró, M. (2006) FreeLing 1.3: Syntactic and semantic services in an open-source NLP library. Proceedings of the fifth international conference on Language Resources and Evaluation (LREC 2006), ELRA. Genoa, Italy. May, 2006.
Carrera, J.; Castellón, I.; Lloberes, M.; Padró, L.; Tinkova, N. (2008). Dependency Grammars in FreeLing. Procesamiento del Lenguaje Natural n. 41, pg. 21--28. September, 2008.
Lloberes, M.; Castellón, I.; Padró, L. (2010). Spanish FreeLing Dependency Grammar. Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), La Valleta, Malta, 2010.
IXA-pipes
Agerri, R.; Bermudez, J.; Rigau, G. (2014): IXA pipes: Efficient and Ready to Use Multilingual NLP tools. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), 26-31 May, 2014, Reykjavik, Iceland.
LSTM-parser
Dyer, C.; Ballesteros, M.; Ling, W.; Matthews, A.; Smith, N. (2015). Transition-based Dependency Parsing with Stack Long Short-Term Memory. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.
SpaCy
Visit to get documentation.
Stanford CoreNLP
Manning, C. (2011). Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? Proceedings of 12th International Conference, CICLing 2011, Tokyo, Japan, February 20-26, 2011: 171-189.
SyntaxNet
Weiss, D.; Alberti, C.; Collins, M.; Petrov, S. (2015). Structured Training for Neural Network Transition-Based Parsing. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing.
TreeTagger
Schmid, H. (1995): Improvements in Part-of-Speech Tagging with an Application to German. Proceedings of the ACL SIGDAT-Workshop. Dublin, Ireland.
Schmid, H. (1994): Probabilistic Part-of-Speech Tagging Using Decision Trees. Proceedings of International Conference on New Methods in Language Processing, Manchester, UK.
UDPipe
Straka, M.; Straková, J. (2017). Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Vancouver, Canada, August 2017.
Straka, M.; Hajič J.; Straková, J. (2016). UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, May 2016.