• Media type: E-Article
  • Title: The archives are half-empty: an assessment of the availability of microbial community sequencing data
  • Contributor: Jurburg, Stephanie D.; Konzack, Maximilian; Eisenhauer, Nico; Heintz-Buschart, Anna
  • imprint: Springer Science and Business Media LLC, 2020
  • Published in: Communications Biology
  • Language: English
  • DOI: 10.1038/s42003-020-01204-9
  • ISSN: 2399-3642
  • Origination:
  • Footnote:
  • Description: <jats:title>Abstract</jats:title><jats:p>As DNA sequencing has become more popular, the public genetic repositories where sequences are archived have experienced explosive growth. These repositories now hold invaluable collections of sequences, e.g., for microbial ecology, but whether these data are reusable has not been evaluated. We assessed the availability and state of 16S rRNA gene amplicon sequences archived in public genetic repositories (SRA, EBI, and DDJ). We screened 26,927 publications in 17 microbiology journals, identifying 2015 16S rRNA gene sequencing studies. Of these, 7.2% had not made their data public at the time of analysis. Among a subset of 635 studies sequencing the same gene region, 40.3% contained data which was not available or not reusable, and an additional 25.5% contained faults in data formatting or data labeling, creating obstacles for data reuse. Our study reveals gaps in data availability, identifies major contributors to data loss, and offers suggestions for improving data archiving practices.</jats:p>
  • Access State: Open Access