• Medientyp: E-Artikel
  • Titel: High‐throughput complement component 4 genomic sequence analysis with C4Investigator
  • Beteiligte: Marin, Wesley M.; Augusto, Danillo G.; Wade, Kristen J.; Hollenbach, Jill A.
  • Erschienen: Wiley, 2024
  • Erschienen in: HLA
  • Sprache: Englisch
  • DOI: 10.1111/tan.15273
  • ISSN: 2059-2302; 2059-2310
  • Schlagwörter: Genetics ; Immunology ; Immunology and Allergy
  • Entstehung:
  • Anmerkungen:
  • Beschreibung: <jats:p>The complement component 4 gene loci, composed of the <jats:italic>C4A</jats:italic> and <jats:italic>C4B</jats:italic> genes and located on chromosome 6, encodes for complement component 4 (C4) proteins, a key intermediate in the classical and lectin pathways of the complement system. The complement system is an important modulator of immune system activity and is also involved in the clearance of immune complexes and cellular debris. <jats:italic>C4A</jats:italic> and <jats:italic>C4B</jats:italic> gene loci exhibit copy number variation, with each composite gene varying between 0 and 5 copies per haplotype. <jats:italic>C4A</jats:italic> and <jats:italic>C4B</jats:italic> genes also vary in size depending on the presence of the human endogenous retrovirus (HERV) in intron 9, denoted by <jats:italic>C4(L)</jats:italic> for long‐form and <jats:italic>C4(S)</jats:italic> for short‐form, which affects expression and is found in both <jats:italic>C4A</jats:italic> and <jats:italic>C4B</jats:italic>. Additionally, human blood group antigens Rodgers and Chido are located on the C4 protein, with the Rodger epitope generally found on C4A protein, and the Chido epitope generally found on C4B protein. <jats:italic>C4A</jats:italic> and <jats:italic>C4B</jats:italic> copy number variation has been implicated in numerous autoimmune and pathogenic diseases. Despite the central role of C4 in immune function and regulation, high‐throughput genomic sequence analysis of <jats:italic>C4A</jats:italic> and <jats:italic>C4B</jats:italic> variants has been impeded by the high degree of sequence similarity and complex genetic variation exhibited by these genes. To investigate C4 variation using genomic sequencing data, we have developed a novel bioinformatic pipeline for comprehensive, high‐throughput characterization of human <jats:italic>C4A</jats:italic> and <jats:italic>C4B</jats:italic> sequences from short‐read sequencing data, named C4Investigator. Using paired‐end targeted or whole genome sequence data as input, C4Investigator determines the overall gene copy numbers, as well as <jats:italic>C4A</jats:italic>, <jats:italic>C4B</jats:italic>, <jats:italic>C4(Rodger)</jats:italic>, <jats:italic>C4(Ch)</jats:italic>, <jats:italic>C4(L)</jats:italic>, and <jats:italic>C4(S)</jats:italic>. Additionally, C4Ivestigator reports the full overall <jats:italic>C4A</jats:italic> and <jats:italic>C4B</jats:italic> aligned sequence, enabling nucleotide level analysis. To demonstrate the utility of this workflow we have analyzed <jats:italic>C4A</jats:italic> and <jats:italic>C4B</jats:italic> variation in the 1000 Genomes Project Data set, showing that these genes are highly poly‐allelic with many variants that have the potential to impact C4 protein function.</jats:p>