Published:
Springer Science and Business Media LLC, 2023
Published in:
Datenbank-Spektrum, 23 (2023) 1, Seite 15-25
Language:
English
DOI:
10.1007/s13222-023-00438-1
ISSN:
1618-2162;
1610-1995
Origination:
Footnote:
Description:
AbstractAbusive language detection has become an integral part of the research, as reflected in numerous publications and several shared tasks conducted in recent years. It has been shown that the obtained models perform well on the datasets on which they were trained, but have difficulty generalizing to other datasets. This work also focuses on model generalization, but – in contrast to previous work – we use homogeneous datasets for our experiments, assuming that they have a higher generalizability. We want to find out how similar datasets have to be for trained models to generalize and whether generalizability depends on the method used to obtain a model. To this end, we selected four German datasets from popular shared tasks, three of which are from consecutive GermEval shared tasks. Furthermore, we evaluate two deep learning methods and three traditional machine learning methods to derive generalizability trends based on the results. Our experiments show that generalization is only partially given, although the annotation schemes for these datasets are almost identical. Our findings additionally show that generalizability depends solely on the (combinations of) training sets and is consistent no matter what the underlying method is.