Discriminative Learning for Probabilistic Sequence Analysis ; Diskriminatives Lernen in der probabilistischen Sequenzanalyse

Media type: E-Book; Doctoral Thesis; Electronic Thesis

Title: Discriminative Learning for Probabilistic Sequence Analysis ; Diskriminatives Lernen in der probabilistischen Sequenzanalyse

Contributor: Maaskola, Jonas [Author]

imprint: Freie Universität Berlin: Refubium (FU Berlin), 2015

Extent: XVIII, 301 S.

Language: English

DOI: https://doi.org/10.17169/refubium-8314

Keywords: Motif Discovery ; Hidden Markov Model ; Motif Searching ; Nucleic Acids ; RNA ; HMM ; Discriminative Learning ; DNA

Origination:

Footnote: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.

Description: This dissertation presents a study of discriminative learning techniques for probabilistic sequence analysis that find application in pattern discovery of binding sites in nucleic acid sequences. Sets of positive and negative example sequences define contrasts that are mined for sequence motifs whose occurrence frequency varies between the sets. A discriminative motif discovery method based on hidden Markov models (HMMs) is described that allows choice of different objective functions, two of which are used for the first time for motif finding with HMMs: mutual information of condition and motif occurrence (MICO), and Matthews correlation coefficient. We perform an extensive and systematic comparison of motif discovery performance of our method and numerous published tools. Using MICO or several other of the implemented objective functions, our method’s performance exceeds that of all other tools. MICO is also the most generally useful discriminative objective function, as it is applicable both to the analysis of probabilistic as well as discrete binding motif models, can leverage contrasts of more than two conditions, and provides natural extensions to quantify conditional association that are used to build models of multiple motifs. The investigation concludes with several case studies comprising 30 datasets from transcriptome-scale technologies —ChIP-Seq, RIP-ChIP, and PAR-CLIP—of embryonic stem cell transcription factors and of RNA-binding proteins. The case studies demonstrate practicality and utility of the method, and validate it by reproducing motifs of well-studied proteins. In addition, they provide novel insights by connecting previously known splicing-relevant motifs to an alternative splicing regulator. The presented motif discovery method scales to large data sizes, makes use of available repeat experiments for increased statistical power, and aside from binary contrasts also more complex data configurations can be utilized. It is implemented in the open source software Discrover (portmanteau of ...

Access State: Open Access

Search in field:

Recently searched for: