• Media type: E-Article
  • Title: A cooperative crowdsourcing framework for knowledge extraction in digital humanities – cases on Tang poetry
  • Contributor: Hong, Liang; Hou, Wenjun; Wu, Zonghui; Han, Huijie
  • Published: Emerald, 2020
  • Published in: Aslib Journal of Information Management, 72 (2020) 2, Seite 243-261
  • Language: English
  • DOI: 10.1108/ajim-07-2019-0192
  • ISSN: 2050-3806
  • Origination:
  • Footnote:
  • Description: PurposeThe purpose of this paper is to propose a knowledge extraction framework to extract knowledge, including entities and relationships between them, from unstructured texts in digital humanities (DH).Design/methodology/approachThe proposed cooperative crowdsourcing framework (CCF) uses both human–computer cooperation and crowdsourcing to achieve high-quality and scalable knowledge extraction. CCF integrates active learning with a novel category-based crowdsourcing mechanism to facilitate domain experts labeling and verifying extracted knowledge.FindingsThe case study shows that CCF can effectively and efficiently extract knowledge from multi-sourced heterogeneous data in the field of Tang poetry. Specifically, CCF achieves higher accuracy of knowledge extraction than the state-of-the-art methods, the contribution of feedbacks to the training model can be maximized by the active learning mechanism and the proposed category-based crowdsourcing mechanism can scale up the effective human–computer collaboration by considering the specialization of workers in different categories of tasks.Research limitations/implicationsThis research proposes CCF to enable high-quality and scalable knowledge extraction in the field of Tang poetry. CCF can be generalized to other fields of DH by introducing domain knowledge and experts.Practical implicationsThe extracted knowledge is machine-understandable and can support the research of Tang poetry and knowledge-driven intelligent applications in DH.Originality/valueCCF is the first human-in-the-loop knowledge extraction framework that integrates active learning and crowdsourcing mechanisms; he human–computer cooperation method uses the feedback of domain experts through the active learning mechanism; the category-based crowdsourcing mechanism considers the matching of categories of DH data and especially of domain experts.