On the utility of word embeddings for enriching OpenWordNet-PT
Hugo Gonçalo Oliveira, Fredson Silva de Souza Aguiar, et al.
LDK 2021
One of the prerequisites for many natural language processing technologies is the availability of large lexical resources. This paper reports on MorphoBr, an ongoing project aiming at building a comprehensive full-form lexicon for morphological analysis of Portuguese. A first version of the resource is already freely available online under an open source, free software license. MorphoBr combines analogous free resources, correcting several thousand errors and gaps, and systematically adding new entries. In comparison to the integrated resources, lexical entries in MorphoBr follow a more user-friendly format, which can be straightforwardly compiled into finite-state transducers for morphological analysis, e.g. in the context of syntactic parsing with a grammar in the LFG formalism using the XLE system. MorphoBr results from a combination of computational techniques. Errors and the more obvious gaps in the integrated resources were automatically corrected with scripts. However, MorphoBr's main contribution is the expansion in the inventory of nouns and adjectives. This was carried out by systematically modeling diminutive formation in the paradigm of finite-state morphology. This allowed MorphoBr to significantly outperform analogous resources in the coverage of diminutives. The first evaluation results show MorphoBr to be a promising initiative which will directly contribute to the development of more robust natural language processing tools and applications which depend on wide-coverage morphological analysis.
Hugo Gonçalo Oliveira, Fredson Silva de Souza Aguiar, et al.
LDK 2021
Pedro Delfino, Bruno Cuconato, et al.
GWC 2018
Fabricio Chalub, Livy Real, et al.
LREC 2016
Bernardo Alkmim, Alexandre Rademaker, et al.
XAILA 2018