Methods for Processing and Analyzing Protein Structure Collections for Data-Driven Structure-Property Relationship Modeling

Medientyp: Elektronische Hochschulschrift; Dissertation; E-Book

Titel: Methods for Processing and Analyzing Protein Structure Collections for Data-Driven Structure-Property Relationship Modeling

Beteiligte: Sieg, Jochen [VerfasserIn]

Erschienen: Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2023-12

Sprache: Englisch

DOI: https://doi.org/10.1021/acs.jcim.8b00712; https://doi.org/10.1002/prot.26337; https://doi.org/10.1021/acs.jcim.3c00100; https://doi.org/10.1093/bib/bbad357; https://doi.org/10.1093/nar/gkac305

Schlagwörter: Arzneimitteldesign ; Algorithmus ; Computational chemistry ; virtual screening ; Bioinformatik ; Proteindesign ; Maschinelles Lernen ; protein property prediction ; mutation prediction ; protein ligand interaction prediction

Entstehung:

Anmerkungen: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.

Beschreibung: Effective prediction of the properties of biomolecules could answer crucial research questions: Which biomolecule would be an effective drug for a particular disease? Will a mutation in a patient be pathologic? Which biomolecule can break down materials like plastics? The structure-property relationship paradigm is a central concept describing that the biomolecule’s structure determines its properties. Especially for proteins, the so-called building blocks of life, high-quality three-dimensional structure data has increased tremendously in the last years. Data-driven prediction methods, like machine learning, are a promising choice to predict properties from the structure data. However, such data-driven methods are subject to data limitations and need protein representations adequate for proteins’ nature and properties. In this work, methods were developed to analyze and process data sets for improving data-driven property prediction. First, a machine learning-based interpretability method was developed to analyze predictive features on a data set for a given property-prediction task. The technique was first applied to analyze unbiasing strategies in benchmark data sets for structure-based virtual screening in drug discovery. Then, it was extended with the Shapley Values framework and used to interpret stabilizing protein adaptations for protein engineering. Besides important domain-specific trends, the analyses demonstrated that data limitations are a profound bottleneck in structure-property modeling. Obtaining more data is often not possible. An effective alternative can be to process the existing data to derive better protein representations for the task at hand. Two processing methods that describe relevant protein variabilities using structure ensembles were developed. The first method enumerates alternative conformations from AltLoc annotations to represent proteins’ inherent flexibility. The second method constructs structure ensembles through the similarity of residue 3D micro-environments to represent ...

Zugangsstatus: Freier Zugang

Rechte-/Nutzungshinweise: Namensnennung (CC BY) Namensnennung (CC BY)

Nur in Feld suchen:

Zuletzt gesuchte Begriffe: