• Media type: Electronic Conference Proceeding; Text; E-Article
  • Title: EMMA: Adding Sequences into a Constraint Alignment with High Accuracy and Scalability (Abstract)
  • Contributor: Shen, Chengze [Author]; Liu, Baqiao [Author]; Williams, Kelly P. [Author]; Warnow, Tandy [Author]
  • imprint: Schloss Dagstuhl – Leibniz-Zentrum für Informatik, 2023
  • Language: English
  • DOI: https://doi.org/10.4230/LIPIcs.WABI.2023.2
  • Keywords: MAFFT ; Multiple sequence alignment ; constraint alignment
  • Origination:
  • Footnote: Diese Datenquelle enthält auch Bestandsnachweise, die nicht zu einem Volltext führen.
  • Description: Multiple sequence alignment (MSA) is a crucial precursor to many downstream biological analyses, such as phylogeny estimation [Morrison, 2006], RNA structure prediction [Shapiro et al., 2007], protein structure prediction [Jumper et al., 2021], etc. Obtaining an accurate MSA can be challenging, especially when the dataset is large (i.e., more than 1000 sequences). A key technique for large-scale MSA estimation is to add sequences into an existing alignment. For example, biological knowledge can be used to form a reference alignment on a subset of the sequences, and then the remaining sequences can be added to the reference alignment. Another case where adding sequences into an existing alignment occurs is when new sequences or genomes are added to databases, leading to the opportunity to add the new sequences for each gene in the genome into a growing alignment. A third case is for de novo multiple sequence alignment, where a subset of the sequences is selected and aligned, and then the remaining sequences are added into this "backbone alignment" [Nguyen et al., 2015; Park et al., 2023; Shen et al., 2022; Liu and Warnow, 2023; Park and Warnow, 2023; Yamada et al., 2016]. Thus, adding sequences into existing alignments is a natural problem with multiple applications to biological sequence analysis. A few methods have been developed to add sequences into an existing alignment, with MAFFT--add [Katoh and Frith, 2012] perhaps the most well-known. However, several multiple sequence alignment methods that operate in two steps (first extract and align the backbone sequences and then add the remaining sequences into this backbone alignment) also provide utilities for adding sequences into a user-provided alignment. We present EMMA, a new approach for adding "query" sequences into an existing "constraint" alignment. By construction, EMMA never changes the constraint alignment, except through the introduction of additional sites to represent homologies between the query sequences. EMMA uses a divide-and-conquer technique ...
  • Access State: Open Access