• Media type: Doctoral Thesis; Electronic Thesis; E-Book
  • Title: AHRD: Automatically Annotate Proteins with Human Readable Descriptions and Gene Ontology Terms
  • Contributor: Boecker, Florian [Author]
  • Published: Universitäts- und Landesbibliothek Bonn, 2021-10-05
  • Language: English
  • DOI: https://doi.org/20.500.11811/9344
  • Keywords: Protein ; Genomik ; Bioinformatics ; Proteomics ; Funktionsvorhersage ; Proteomik ; Function Prediction ; Genomics ; Bioinformatik
  • Origination:
  • Footnote: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Description: In the postgenomic era it is impossible to annotate the majority of new proteins in any other way than with computational methods. Our tool AHRD automatically annotates proteins with human readable descriptions and Gene Ontology (GO) terms on a genomic scale. It does so by performing a lexical analysis modeled on the decision process of a human curator investigating the protein descriptions of homologous proteins found by sequence similarity. The central questions of this thesis are how GO annotations can be accurately evaluated and how the annotation performance of AHRD can be increased. To this end we firstly generated an unbiased ground truth set of high quality protein annotations with minimal redundancy. It contains many proteins that are difficult to annotate and thus facilitates contrasting annotation methods. Secondly, we implemented and tested three evaluation metrics for the congruence of GO term annotations. The third metric, which employs the structure of the Gene Ontology and the commonness of GO terms to determine the semantic similarity of GO annotations, is able to perform the most nuanced and consistent evaluation. In addition to a preexisting simulated annealing-based approach a genetic algorithm-based machine learning method was implemented to use the aforementioned evaluation metrics to optimize AHRD's input parameters. Although the genetic algorithm was only able to provide small improvements, they were statistically significant and parameter optimization proved to be necessary to achieve optimal annotation performance. In the style of the lexical analysis of candidate descriptions a new GO term-based analysis for candidate annotations was created. This was able to improve AHRD's GO annotation performance and also enabled the incorporation of new quality indicators such as GO term information content and annotation evidence codes which improved the performance further. It also facilitated the annotation with newly combined sets of GO terms instead of only fixed sets obtained from reference ...
  • Access State: Open Access
  • Rights information: In Copyright