Sie können Bookmarks mittels Listen verwalten, loggen Sie sich dafür bitte in Ihr SLUB Benutzerkonto ein.
Medientyp:
E-Artikel
Titel:
Automated occupation coding with hierarchical features: a data-centric approach to classification with pre-trained language models
Beteiligte:
Safikhani, Parisa;
Avetisyan, Hayastan;
Föste-Eggers, Dennis;
Broneske, David
Erschienen:
Springer Science and Business Media LLC, 2023
Erschienen in:Discover Artificial Intelligence
Sprache:
Englisch
DOI:
10.1007/s44163-023-00050-y
ISSN:
2731-0809
Entstehung:
Anmerkungen:
Beschreibung:
<jats:title>Abstract</jats:title><jats:p>Occupation coding is the classification of information on occupation that is collected in the context of demographic variables. Occupation coding is an important, but a tedious task for researchers in social science and official statistics that calls for automation. Due to the complexity of the task, currently, researchers carry out hand-coding or computer-assisted coding. However, we argue that, with the rise of transformer-based language models, hand-coding can be displaced by models, such as BERT or GPT3. Hence, we compare these models with state-of-the-art encoding approaches, showing that language models have a clear advantage in Cohen’s kappa compared to related approaches, but also allow for flexible fine-grained coding of single digits. Taking into consideration the hierarchical structure of the occupational group, we also develop an approach that achieves better performance for the classification of different single digit combinations.</jats:p>