Description:
In many analysis of high-throughput data in systems biology, there is a need to quantify the activity of a set of genes in individual samples. In cancer the same pathway can be affected by defects in different individual genes in different patients and application of gene set approaches in the analysis of genomic data can help to capture biological information that is otherwise undetectable by focusing on individual genes. We present here ROMA (Representation and quantification Of Module Activities) software, designed for fast and robust computation of the activity of gene sets (or modules) with coordinated expression. ROMA activity quantification is based on the simplest uni-factor linear model of gene regulation that approximates the expression data of the gene set by its first principal component.The proposed algorithm implements novel functionalities: it allows to identify which genes contribute mainly to the activity of the module; it provides several alternative methods for principal components computation, including weighted and centered versions of principal component analysis; it distinguishes overdispersed modules (based on the variance explained by the firts principal component) and coordinated modules (based on the significance of the spectral gap); finally, it computes statistical significance of the estimated module overdispersion. ROMA can be applied in many contexts, from estimating differential activities of transcriptional factors to finding overdispersed pathways in single-cell transcriptomics data. We present here the principles of ROMA providing a practical example of its use. We applied it to compare distinct subtypes of medulloblastoma disease in terms of activated/inactivated signalling pathways and transcritpional programs