• Media type: E-Article
  • Title: Benchmarking table recognition performance on biomedical literature on neurological disorders
  • Contributor: Adams, Tim; Namysl, Marcin; Kodamullil, Alpha Tom; Behnke, Sven; Jacobs, Marc
  • Published: Oxford University Press (OUP), 2022
  • Published in: Bioinformatics, 38 (2022) 6, Seite 1624-1630
  • Language: English
  • DOI: 10.1093/bioinformatics/btab843
  • ISSN: 1367-4803; 1367-4811
  • Keywords: Computational Mathematics ; Computational Theory and Mathematics ; Computer Science Applications ; Molecular Biology ; Biochemistry ; Statistics and Probability
  • Origination:
  • Footnote:
  • Description: Abstract Motivation Table recognition systems are widely used to extract and structure quantitative information from the vast amount of documents that are increasingly available from different open sources. While many systems already perform well on tables with a simple layout, tables in the biomedical domain are often much more complex. Benchmark and training data for such tables are however very limited. Results To address this issue, we present a novel, highly curated benchmark dataset based on a hand-curated literature corpus on neurological disorders, which can be used to tune and evaluate table extraction applications for this challenging domain. We evaluate several state-of-the-art table extraction systems based on our proposed benchmark and discuss challenges that emerged during the benchmark creation as well as factors that can impact the performance of recognition methods. For the evaluation procedure, we propose a new metric as well as several improvements that result in a better performance evaluation. Availability and implementation The resulting benchmark dataset (https://zenodo.org/record/5549977) as well as the source code to our novel evaluation approach can be openly accessed. Supplementary information Supplementary data are available at Bioinformatics online.
  • Access State: Open Access