LX-Suite

Developed at the University of Lisbon, Dept. of Informatics, by the NLX-Natural Language and Speech Group.


features

 

 

Features


Table of contents



LX-Suite

LX-Suite (beta 2 version) is a freely available online service for the shallow processing of Portuguese. It was developed and is mantained by the NLX-Natural Language and Speech Group at the University of Lisbon, Department of Informatics.

Version history:

You may be also interested to use our LX-Conjugator and LX-Lemmatizer online services for the conjugation and lemmatization of verbs, and LX-Inflector online service for the inflection of nominal classes.

Features and Evaluation

LX-Suite is composed by a set of shallow processing tools:

These tools work in a pipeline scheme, where each tool takes as input the output of the previous tool.

Authorship

LX-Suite is being developed by António Branco and João Silva, with the key contribution of Filipe Nunes (verbal lemmatizer), and the help of Francisco Costa, Catarina Ribeiro and Ricardo Santos at the NLX—Natural Language and Speech Group at the University of Lisbon, Department of Informatics.

Acknowledgments

The development of a state-of-the-art, complete suite of shallow processing tools for Portuguese was supported by FCT-Fundação para a Ciência e Tecnologia under the contract POSI/PLP/47058/2002 for the project TagShare and the contract POSI/PLP/61490/2004 for the project QueXting, and the European Commission under the contract FP6/STREP/27391 for the project LT4eL.

This project was developed in cooperation with CLUL—Centro de Linguística da Universidade de Lisboa. The training and test corpora prepared for the development of this demo evolved from a corpus provided by CLUL.

This demo includes a part-of-speech tagger developed with Thorsten Brants' TnT software with his written permission.

White Papers

Branco, António and João Silva, 2006. Dedicated Nominal Featurization of Portuguese. In Proceedings of the VII Encontro para o Processamento Computacional da Língua Portuguesa Escrita e Falada (PROPOR'06).

Barreto, Florbela, António Branco, Eduardo Ferreira, Amália Mendes, Maria Fernanda Bacelar do Nascimento, Filipe Nunes and João Silva, 2006. Open Resources and Tools for the Shallow Processing of Portuguese: The TagShare Project. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'06).

Branco, António and João Silva, 2006. A Suite of Shallow Processing Tools for Portuguese: LX-Suite. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL'06).

Branco, António, Filipe Nunes and João Silva, 2006. Verb Analysis in an Inflective Language: Simpler is better. Internal report, University of Lisbon, Department of Informatics, NLX-Natural Language and Speech Group.

Branco, António and João Silva, 2005. Accurate Annotation: an Efficiency Metric. In Nicolas Nicolov, Kalina Bontcheva, Galia Angelova and Ruslan Mitkov (eds.), Recent Advances in Natural Language Processing III, Amsterdam, John Benjamins, pp.173-182.

Branco, António and João Silva, 2004. Swift Development of State of the Art Taggers for Portuguese. In António Branco, Amália Mendes and Ricardo Ribeiro (orgs.), Language Technology for Portuguese: Shallow Processing Tools and Resources. Lisbon, Edições Colibri, pp. 29-46.

Branco, António and João Silva, 2004. Evaluating Solutions for the Rapid Development of State-of-the-Art POS Taggers for Portuguese. In Maria Teresa Lino, Maria Francisca Xavier, Fátima Ferreira, Rute Costa and Raquel Silva (eds.), Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC'04), Paris, ELRA, pp.507-510.

Branco, António, Amália Mendes and Ricardo Ribeiro (eds.), 2003. Tagging and Shallow Processing of Portuguese: Workshop Notes of TASHA'2003. Lisbon, University of Lisbon, Faculty of Sciences, Department of Informatics, Technical Report TR-2003-28.

Branco, António and João Silva, 2003. Portuguese-specific Issues in the Rapid Development of State of the Art Taggers. In António Branco, Amália Mendes and Ricardo Ribeiro (eds.), 2003, pp.7-10.

Mendes, Amália, Raquel Amaro, M. Fernanda Bacelar do Nascimento, 2004. Morphological Tagging of a Spoken Portuguese Corpus Using Available Resources. In António Branco, Amália Mendes and Ricardo Ribeiro (orgs.), Language Technology for Portuguese: Shallow Processing Tools and Resources. Lisbon, Edições Colibri, pp. 47-62.

Mendes, Amália, Raquel Amaro, M. Fernanda Bacelar do Nascimento, 2003. Reusing Available Resources for Tagging a Spoken Portuguese Corpus. In António Branco, Amália Mendes and Ricardo Ribeiro (eds.), 2003, pp.25-28.

TagShare, 2004, Manual de Etiquetação e Convenções, Internal Report, University of Lisbon, Department of Informatics, NLX-Natural Language and Speech Group.

Contact Us

Contact us using the following email address: 'nlxgroup' concatenated with 'at' concatenated with 'di.fc.ul.pt'.

Why LX-Suite?

LX because LX is the "code" name Lisboners like to use to refer to their hometown.

Tagset: POS

TagCategoryExamples
ADJAdjectivesbom, brilhante, eficaz, …
ADVAdverbshoje, já, sim, felizmente, …
CARDCardinalszero, dez, cem, mil, …
CJConjunctionse, ou, tal como, …
CLCliticso, lhe, se, …
CNCommon Nounscomputador, cidade, ideia, …
DADefinite Articleso, os, …
DEMDemonstrativeseste, esses, aquele, …
DFRDenominators of Fractionsmeio, terço, décimo, %, …
DGTRRoman NumeralsVI, LX, MMIII, MCMXCIX, …
DGTDigits0, 1, 42, 12345, 67890, …
DMDiscourse Markerolá, …
EADRElectronic Addresseshttp://www.di.fc.ul.pt, …
EOEEnd of Enumerationetc
EXCExclamativeah, ei, etc.
GERGerundssendo, afirmando, vivendo, …
GERAUXGerund "ter"/"haver" in compound tensestendo, havendo …
IAIndefinite Articlesuns, umas, …
INDIndefinitestudo, alguém, ninguém, …
INFInfinitiveser, afirmar, viver, …
INFAUXInfinitive "ter"/"haver" in compound tensester, haver …
INTInterrogativesquem, como, quando, …
ITJInterjectionbolas, caramba, …
LTRLettersa, b, c, …
MGTMagnitude Classesunidade, dezena, dúzia, resma, …
MTHMonthsJaneiro, Dezembro, …
NPNoun Phrasesidem, …
ORDOrdinalsprimeiro, centésimo, penúltimo, …
PADRPart of AddressRua, av., rot., …
PNMPart of NameLisboa, António, João, …
PNTPunctuation Marks., ?, (, …
POSSPossessivesmeu, teu, seu, …
PPAPast Participles not in compound tensesafirmados, vivida, …
PPPrepositional Phrasesalgures, …
PPTPast Participle in compound tensessido, afirmado, vivido, …
PREPPrepositionsde, para, em redor de, …
PRSPersonalseu, tu, ele, …
QNTQuantifierstodos, muitos, nenhum, …
RELRelativesque, cujo, tal que, …
STTSocial TitlesPresidente, drª., prof., …
SYBSymbols@, #, &, …
TERMNOptional Terminations(s), (as), …
UM"um" or "uma"um, uma
UNITAbbreviated Measurement Unitskg., km., …
VAUXFinite "ter" or "haver" in compound tensestemos, haveriam, …
VVerbs (other than PPA, PPT, INF or GER)falou, falaria, …
WDWeek Dayssegunda, terça-feira, sábado, …
Multi-Word Expressions
LADV1…LADVnMulti-Word Adverbsde facto, em suma, um pouco, …
LCJ1…LCJnMulti-Word Conjunctionsassim como, já que, …
LDEM1…LDEMnMulti-Word Demonstrativeso mesmo, …
LDFR1…LDFRnMulti-Word Denominators of Fractionspor cento
LDM1…LDMnMulti-Word Discourse Markerspois não, até logo, …
LITJ1…LITJnMulti-Word Interjectionsmeu Deus
LPRS1…LPRSnMulti-Word Personalsa gente, si mesmo, V. Exa., …
LPREP1…LPREPnMulti-Word Prepositionsatravés de, a partir de, …
LQD1…LQDnMulti-Word Quantifiersuns quantos, …
LREL1…LRELnMulti-Word Relativestal como, …

Tagset: Other tags

TagDescription
mMasculine
fFeminine
sSingular
pPlural
dimDiminutive
supSuperlative
compComparative
1First Person
2Second Person
3Third Person
piPresente do Indicativo
ppiPretérito Perfeito do Indicativo
iiPretérito Imperfeito do Indicativo
mpiPretérito Mais que Perfeito do Indicativo
fiFuturo do Indicativo
cCondicional
pcPresente do Conjuntivo
icPretérito Imperfeito do Conjuntivo
fcFuturo do Conjuntivo
impImperativo