• Media type: Text; Electronic Conference Proceeding; E-Article
  • Title: Reasoning with Portuguese Word Embeddings
  • Contributor: Cunha, Luís Filipe [Author]; Almeida, J. João [Author]; Simões, Alberto [Author]
  • imprint: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2022
  • Language: English
  • DOI: https://doi.org/10.4230/OASIcs.SLATE.2022.17
  • Keywords: Evaluation Methods ; Word2Vec ; Word Embeddings
  • Origination:
  • Footnote: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Description: Representing words with semantic distributions to create ML models is a widely used technique to perform Natural Language processing tasks. In this paper, we trained word embedding models with different types of Portuguese corpora, analyzing the influence of the models' parameterization, the corpora size, and domain. Then we validated each model with the classical evaluation methods available: four words analogies and measurement of the similarity of pairs of words. In addition to these methods, we proposed new alternative techniques to validate word embedding models, presenting new resources for this purpose. Finally, we discussed the obtained results and argued about some limitations of the word embedding models' evaluation methods.
  • Access State: Open Access