• Media type: E-Article
  • Title: COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems
  • Contributor: Mayo Yanes, Eduardo; Chakraborty, Sabyasachi; Gershoni-Poranne, Renana
  • imprint: Springer Science and Business Media LLC, 2024
  • Published in: Scientific Data
  • Language: English
  • DOI: 10.1038/s41597-024-02927-8
  • ISSN: 2052-4463
  • Keywords: Library and Information Sciences ; Statistics, Probability and Uncertainty ; Computer Science Applications ; Education ; Information Systems ; Statistics and Probability
  • Origination:
  • Footnote:
  • Description: <jats:title>Abstract</jats:title><jats:p>Polycyclic aromatic systems are highly important to numerous applications, in particular to organic electronics and optoelectronics. High-throughput screening and generative models that can help to identify new molecules to advance these technologies require large amounts of high-quality data, which is expensive to generate. In this report, we present the largest freely available dataset of geometries and properties of <jats:italic>cata</jats:italic>-condensed poly(hetero)cyclic aromatic molecules calculated to date. Our dataset contains ~500k molecules comprising 11 types of aromatic and antiaromatic building blocks calculated at the GFN1-xTB level and is representative of a highly diverse chemical space. We detail the structure enumeration process and the methods used to provide various electronic properties (including HOMO-LUMO gap, adiabatic ionization potential, and adiabatic electron affinity). Additionally, we benchmark against a ~50k dataset calculated at the CAM-B3LYP-D3BJ/def2-SVP level and develop a fitting scheme to correct the xTB values to higher accuracy. These new datasets represent the second installment in the COMputational database of Polycyclic Aromatic Systems (COMPAS) Project.</jats:p>
  • Access State: Open Access