• Media type: Electronic Thesis; Doctoral Thesis; E-Book
  • Title: Causality in Unsupervised Learning: Methods and Applications in Cancer Genomics
  • Contributor: Bayer, Fritz [Author]
  • imprint: ETH Zurich, 2024
  • Language: English
  • DOI: https://doi.org/20.500.11850/658760; https://doi.org/10.3929/ethz-b-000658760
  • Keywords: Clustering ; Bayesian networks ; Mathematics ; computer science ; Genomics ; Causality ; Unsupervised learning ; Cancer ; Data processing
  • Origination:
  • Footnote: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Description: Unsupervised learning deciphers the beautifully complex patterns embedded within vast amounts of data. As one of the main branches of machine learning, it seeks to discover hidden patterns in unlabelled data. One of the prevalent techniques within unsupervised learning is clustering, which groups data into distinct subsets of shared characteristics. In cancer research, clustering offers promising avenues to stratify patients based on their unique genomic and clinical characteristics, which is crucial for developing personalised treatment strategies and improving prognostic evaluations. However, the increasing complexity of the acquired data presents several challenges for unsupervised learning, ranging from the integration of different data types to the prevalence of incomplete datasets. Furthermore, as these computational tools increasingly affect many areas of life, it is essential that they are developed with care to prevent discrimination against any protected groups. This thesis presents novel methods that enhance the efficiency and fairness of unsupervised learning by exploiting causal knowledge about the data. The main contributions are detailed in three separate studies, each presenting a novel methodological approach. The first study introduces a novel network-based clustering method that enables the stratification of cancer patients based on their individual genomic and clinical characteristics. This approach leverages the causal relationships inherent in the data to effectively integrate genomic and clinical information. When applied to myeloid malignancies -- a group of aggressive cancers with overlapping genomic and clinical characteristics -- this method identified novel cancer subgroups that are highly predictive of survival and reveal distinct genomic and clinical patterns. This novel clustering approach sheds light on the interconnected landscape of the genomic and clinical features across myeloid malignancies and paves the way for improved patient stratification. The second study presents a ...
  • Access State: Open Access
  • Rights information: In Copyright - Non-commercial Use Permitted