• Medientyp: E-Artikel
  • Titel: Automated occupation coding with hierarchical features: a data-centric approach to classification with pre-trained language models
  • Beteiligte: Safikhani, Parisa; Avetisyan, Hayastan; Föste-Eggers, Dennis; Broneske, David
  • Erschienen: Springer Science and Business Media LLC, 2023
  • Erschienen in: Discover Artificial Intelligence
  • Sprache: Englisch
  • DOI: 10.1007/s44163-023-00050-y
  • ISSN: 2731-0809
  • Entstehung:
  • Anmerkungen:
  • Beschreibung: <jats:title>Abstract</jats:title><jats:p>Occupation coding is the classification of information on occupation that is collected in the context of demographic variables. Occupation coding is an important, but a tedious task for researchers in social science and official statistics that calls for automation. Due to the complexity of the task, currently, researchers carry out hand-coding or computer-assisted coding. However, we argue that, with the rise of transformer-based language models, hand-coding can be displaced by models, such as BERT or GPT3. Hence, we compare these models with state-of-the-art encoding approaches, showing that language models have a clear advantage in Cohen’s kappa compared to related approaches, but also allow for flexible fine-grained coding of single digits. Taking into consideration the hierarchical structure of the occupational group, we also develop an approach that achieves better performance for the classification of different single digit combinations.</jats:p>
  • Zugangsstatus: Freier Zugang