• Media type: E-Book
  • Title: Recognition of functional relationships between biomedical concepts in the scientific literature using text mining and machine learning
  • Contributor: Qaseem, Ammar [Verfasser]; Günther, Stefan [Akademischer Betreuer]; Backofen, Rolf [Sonstige]; Bechthold, Andreas [Sonstige]
  • Corporation: Albert-Ludwigs-Universität Freiburg, Pharmazeutische Bioinformatik ; Albert-Ludwigs-Universität Freiburg, Institut für Pharmazeutische Wissenschaften ; Albert-Ludwigs-Universität Freiburg, Fakultät für Chemie und Pharmazie
  • imprint: Freiburg: Universität, 2023
  • Extent: Online-Ressource
  • Language: English
  • DOI: 10.6094/UNIFR/237508
  • Identifier:
  • Keywords: Maschinelles Lernen ; Data Mining ; Künstliche Intelligenz ; Bioinformatik ; Neuronales Netz ; (local)doctoralThesis
  • Origination:
  • University thesis: Dissertation, Universität Freiburg, 2023
  • Footnote:
  • Description: Abstract: A tremendous amount of electronic research data is freely available as online open-source published literature, and which is rapidly growing. This huge, unstructured data contains a great wealth of valuable information which is hidden and difficult to access; e.g. it might be difficult for scientists to identify specific articles of interest. Artificial intelligence-based text mining and machine learning approaches are being exploited to process and analyze such huge amounts of data to identify and extract relevant information. Relevant information can be concepts as well as relationships between those concepts which answer questions of interest. Identifying biomedical concepts (e.g. compounds, proteins, diseases) and the functional relationships between them is one of the important domains in text mining and forms a key component in life science research. In the drug discovery field, knowledge of how small molecules associate with proteins plays a fundamental role in understanding how drugs or metabolites can affect cells, tissues, and human metabolism.<br><br>This dissertation focuses on the automated identification of functional compound-protein relationships in biomedical and life sciences literature using text mining and machine learning techniques. A new benchmark dataset of 2,613 sentences was created, consisting of 5,562 small molecule and protein pairs which had been previously annotated with the help of text mining tools. The pairs were subsequently classified manually as functional or non-functional. Three machine learning approaches named shallow linguistic kernel (SL), all-paths graph kernel (APG), and BioBERT were evaluated to classify these relationships between small molecules and proteins. Furthermore, the benefit of the presence of interaction verbs in sentences which include the functional related compound-protein pairs was evaluated.<br><br>On the benchmark dataset, the BioBERT machine learning approach achieved the best performance, with an F1-score of 86.0%, precision of 85.2%, and recall of 86.8%. Moreover, the trained model was applied on all titles and abstracts of the articles stored in the PubMed database. The results were processed and included in a new web server for literature research (CPRiL). The data allows novel query options, such as the calculation of the shortest relation path between any biomolecule. Currently, CPRiL contains ~2.5 million unique functional related compound-protein pairs, with ~460,000 unique names and synonyms of small molecules and ~90,000 unique proteins
  • Access State: Open Access