• Media type: E-Article
  • Title: How do control tokens affect natural language generation tasks like text simplification
  • Contributor: Li, Zihao; Shardlow, Matthew
  • Published: Cambridge University Press (CUP), 2024
  • Published in: Natural Language Engineering (2024), Seite 1-28
  • Language: English
  • DOI: 10.1017/s1351324923000566
  • ISSN: 1351-3249; 1469-8110
  • Keywords: Artificial Intelligence ; Linguistics and Language ; Language and Linguistics ; Software
  • Origination:
  • Footnote:
  • Description: Abstract Recent work on text simplification has focused on the use of control tokens to further the state-of-the-art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenization strategy, which we also explore. In this paper, we (1) reimplemented AudienCe-CEntric Sentence Simplification, (2) explored the effects and interactions of varying control tokens, (3) tested the influences of different tokenization strategies, (4) demonstrated how separate control tokens affect performance and (5) proposed new methods to predict the value of control tokens. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence performance and give some suggestions for designing control tokens. We show the newly proposed method with higher performance in both SARI (a common scoring metric in text simplificaiton) and BERTScore (a score derived from the BERT language model) and potential in real applications.