• Medientyp: E-Artikel
  • Titel: Approaches to Measure Chemical Similarity – a Review
  • Beteiligte: Nikolova, Nina; Jaworska, Joanna
  • Erschienen: Wiley, 2003
  • Erschienen in: QSAR & Combinatorial Science
  • Sprache: Englisch
  • DOI: 10.1002/qsar.200330831
  • ISSN: 1611-020X; 1611-0218
  • Schlagwörter: Organic Chemistry ; Computer Science Applications ; Drug Discovery
  • Entstehung:
  • Anmerkungen:
  • Beschreibung: <jats:title>Abstract</jats:title><jats:p>Although the concept of similarity is a convenient for humans, a formal definition of similarity between chemical compounds is needed to enable automatic decision‐making. The objective of similarity measures in toxicology and drug design is to allow assessment of chemical activities. The ideal similarity measure should be relevant to the activity of interest. The relevance could be established by exploiting the knowledge about fundamental chemical and biological processes responsible for the activity. Unfortunately, this knowledge is rarely available and therefore different approximations have been developed based on similarity between structures or descriptor values. Various methods are reviewed, ranging from two‐dimensional, three‐dimensional and field approaches to recent methods based on “Atoms in Molecules” theory. All these methods attempt to describe chemical compounds by a set of numerical values and define some means for comparison between them. The review provides analysis of potential pitfalls of this methodology – loss of information in the representations of molecular structures – the relevance of a particular representation and chosen similarity measure to the activity. A brief review of known methods for descriptor selection is also provided. The popular “neighborhood behavior” principle is criticized, since proximity with respect to descriptors does not necessarily mean proximity with respect to activity. Structural similarity should also be used with care, as it does not always imply similar activity, as shown by examples. We remind that similarity measures and classification techniques based on distances rely on certain data distribution assumptions. If these assumptions are not satisfied for a given dataset, the results could be misleading. A discussion on similarity in descriptor space in the context of applicability domain assessment of QSAR models is also provided. Finally, it is shown that descriptor based similarity analysis is prone to errors if the relationship between the activity and the descriptors has not been previously established. A justification for the usage of a particular similarity measure should be provided for every specific activity by expert knowledge or derived by data modeling techniques.</jats:p>