Omix - Get started • Omix

Overview

The goal of Omix is to provide tools in R to build a complete analysis workflow for integrative analysis of data generated from multi-omics platform.

Generate a multi-omics object using MultiAssayExperiment.
Quality control of single-omics data.
Formatting, normalisation, denoising of single-omics data.
Separate single-omics analyses.
Integration of multi-omics data for combined analysis.
Publication quality plots and interactive analysis reports based of shinyApp.

Currently, Omix supports the integration of bulk transcriptomics and bulk proteomics.

Running Omix

The Omix pipeline requires the following input:

rawdata_rna: A data-frame containing raw RNA count with rownames as gene and colnames as samples.
rawdata_protein: A data-frame containing protein abundance with rownames as proteins and colnames as samples.
map_rna: A data-frame of two columns named primary and colname where primary should contain unique sample name with a link to sample metadata and colname is the column names of the rawdata_rna data-frame.
map_protein: A data-frame of two columns named primary and colname where primary should contain unique sample name with a link to sample metadata and colname is the column names of the rawdata_protein data-frame.
metadata_rna: A data-frame containing rna assay specific metadata where rownames are same as the colnames of rawdata_rna data.frame.
metadata_protein: A data-frame containing protein assay specific metadata where rownames are same as the colnames of rawdata_protein data.frame.
individual_metadata: A data-frame containing individual level metadata for both omics assays.

Call required packages for vignette:

library(Omix)
library(purrr)
library(ggplot2)
library(dplyr)
library(tibble)
library(stats)
library(synapser)

First we must load the data from Omix for the vignette:

outputDir <- tempdir()
ctd_fp <-file.path(outputDir, "ctd.rds")
ensembl_fp  <- file.path(outputDir, "ensembl.rds")
GO_fp <- file.path(outputDir, "GO_Biological_Process_2021.txt")
TF_fp <- file.path(outputDir, "Enrichr_Queries.gmt.txt")

download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/GO_Biological_Process_2021.txt",destfile=GO_fp)

download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/ctd-2.rds",destfile=ctd_fp)

download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/ensembl_mappings_human.tsv",destfile=ensembl_fp)

download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/Enrichr_Queries.gmt.txt",destfile=TF_fp)

synapser::synLogin(email=Sys.getenv("SYNAPSE_ID"), authToken=Sys.getenv("AUTHTOKEN"))

Welcome, ems2817!NULL

rawdata_rna <- read.csv(synapser::synGet('syn51533734')$path, header=T, stringsAsFactors = F, row.names=1)
rawdata_protein <- read.csv(synapser::synGet('syn51533735')$path, header=T, stringsAsFactors = F, row.names=1)
map_rna <-read.csv(synapser::synGet('syn51533731')$path, header=T, stringsAsFactors = F, row.names=1)
map_prot <- read.csv(synapser::synGet('syn51533730')$path, header=T, stringsAsFactors = F, row.names=1)
metadata_rna<-read.csv(synapser::synGet('syn51533733')$path, header=T, stringsAsFactors = F, row.names=1)
metadata_prot <- read.csv(synapser::synGet('syn51533732')$path, header=T, stringsAsFactors = F, row.names=1)
individual_metadata <-read.csv(synapser::synGet('syn51533736')$path, header=T, stringsAsFactors = F, row.names=1)

ctd <-  readRDS(paste0(outputDir,'/ctd.rds'))
ensembl <-read.delim(file=paste0(outputDir,'/ensembl.rds'), sep = '\t', header = TRUE)
rna_qc_data_matrix <- NULL

Sanity check

all(rownames(metadata_rna) == colnames(rawdata_rna))

## [1] TRUE

all(rownames(metadata_prot) == colnames(rawdata_protein))

## [1] TRUE

Raw rna counts

rawdata_rna is a data-frame of raw counts, with features as rows and samples as columns

print(rawdata_rna[1:5,1:5])

##                 IGF117756 IGF117759 IGF117764 IGF117772 IGF117778
## ENSG00000223972         1         1         1         0         0
## ENSG00000227232        40        30        18        15        35
## ENSG00000278267         1         0         3         4         3
## ENSG00000243485         0         0         0         0         0
## ENSG00000284332         0         0         0         0         0

Raw protein abundance

rawdata_protein is a data-frame of raw protein abundances, with features as rows and samples as columns

print(rawdata_protein[15:20,15:20])

##                BBN_7540_MTG_P1WG4_001 BBN_7545_MTG_P1WG7_001
## TUBAL3                      7158.6100              9201.4900
## FBLL1                         67.4334                60.1956
## CASTOR2                       17.3159                26.2499
## UNC119;UNC119B                20.3847                12.3418
## YJEFN3                             NA                23.4309
## ARHGAP32                      11.9989                17.0831
##                BBN_7617_MTG_P1WH4_001 BBN_9984_MTG_P3WD1_001
## TUBAL3                     7294.23000             8982.00000
## FBLL1                        85.75520               62.94880
## CASTOR2                      15.63710               19.52160
## UNC119;UNC119B               19.64770                8.60717
## YJEFN3                             NA               25.40000
## ARHGAP32                      5.57716                9.02670
##                BBN002.26094_MTG_P3WC8_001 BBN002.29089_MTG_P3WD7_001
## TUBAL3                         10871.4000                 8487.21000
## FBLL1                             46.9068                   57.40500
## CASTOR2                                NA                   22.37250
## UNC119;UNC119B                         NA                   17.10210
## YJEFN3                            19.7786                   16.63510
## ARHGAP32                               NA                    7.95878

Maps

Maps are data frame containing two columns: primary and colname. The primary column should be the individual ID the individual metadata, and the colname the matched sample names from the raw matrices columns. If sample and individual ids are the same, maps aren’t needed (primary and colnames are the same).

print(head(map_prot))

##         primary                  colname
## 1 BBN_10099-MTG BBN_10099_MTG_P3WH10_001
## 2 BBN_10109-MTG  BBN_10109_MTG_P4WA3_001
## 3 BBN_10611-MTG BBN_10611_MTG_P2WD10_001
## 4 BBN_14801-MTG  BBN_14801_MTG_P1WH9_001
## 5 BBN_16213-MTG  BBN_16213_MTG_P2WH3_001
## 6 BBN_18399-MTG  BBN_18399_MTG_P2WF5_001

print(head(map_rna))

##             primary   colname
## 12 BBN003.32235-MTG IGF117756
## 15     BBN_7617-MTG IGF117759
## 20    BBN_16213-MTG IGF117764
## 25     BBN_7519-MTG IGF117772
## 30     BBN_7540-MTG IGF117778
## 36 BBN003.32227-MTG IGF117784

Technical metadata

Technical metadata are data-frames contain the column colname that should match the sample names in the raw matrices, and any additional columns related to technical artefacts like batches

print(head(metadata_rna))

##                  sample_id sample_name RIN Amyloid_Beta       pTau
## IGF117756 BBN003.32235-MTG   IGF117756 2.7     2.027670 6.98931000
## IGF117759     BBN_7617-MTG   IGF117759 2.0     3.262270 2.55134000
## IGF117764    BBN_16213-MTG   IGF117764 4.6     0.147259 0.00247153
## IGF117772     BBN_7519-MTG   IGF117772 7.5           NA         NA
## IGF117778     BBN_7540-MTG   IGF117778 4.8     3.828680 2.77941000
## IGF117784 BBN003.32227-MTG   IGF117784 1.7     5.015050 2.94966000
##           pct_pTau_Positive        PHF1
## IGF117756        5.25975000 2.420650000
## IGF117759        0.37497600 1.564080000
## IGF117764        0.00876194 0.000519246
## IGF117772                NA          NA
## IGF117778        2.56623000          NA
## IGF117784        0.71418700 2.409610000

print(head(metadata_prot))

##                              sample_id              sample_name batch
## BBN_10099_MTG_P3WH10_001 BBN_10099-MTG BBN_10099_MTG_P3WH10_001    P3
## BBN_10109_MTG_P4WA3_001  BBN_10109-MTG  BBN_10109_MTG_P4WA3_001    P4
## BBN_10611_MTG_P2WD10_001 BBN_10611-MTG BBN_10611_MTG_P2WD10_001    P2
## BBN_14801_MTG_P1WH9_001  BBN_14801-MTG  BBN_14801_MTG_P1WH9_001    P1
## BBN_16213_MTG_P2WH3_001  BBN_16213-MTG  BBN_16213_MTG_P2WH3_001    P2
## BBN_18399_MTG_P2WF5_001  BBN_18399-MTG  BBN_18399_MTG_P2WF5_001    P2

Individual metadata

individual_metadata should contain individual level metadata, where one column matches the primary column in maps.

print(head(individual_metadata))

##       sample_id brain_region age trem2_all    BBN_ID sex diagnosis apoe_final
## 1 BBN_10099-MTG          MTG  91        CV BBN_10099   F   Control      E3/E3
## 2 BBN_10109-MTG          MTG  80        CV BBN_10109   F   Control      E2/E3
## 3 BBN_10611-MTG          MTG  92        CV BBN_10611   F        AD      E3/E3
## 4 BBN_14801-MTG          MTG  65        CV BBN_14801   F   Control      E3/E3
## 5 BBN_16213-MTG          MTG  73        CV BBN_16213   M   Control      E3/E3
## 6 BBN_18399-MTG          MTG  77        CV BBN_18399   M   Control      E3/E3
##   Braak PMD   amyloid       pTau        PHF1 AD
## 1     2  22 0.5852850 0.00649899 0.001686760  0
## 2     2  23 0.7826840 0.92852500 0.678769000  0
## 3     3  30 0.3078820 0.01312750 0.017976000  1
## 4     1  47        NA         NA          NA  0
## 5     0  23 0.1472590 0.00247153 0.000519246  0
## 6     0  11 0.0455756 0.00222531 0.010463800  0

Optional inputs

Optional inputs contain additional data frame used for QC visualisation purposes

print(head(rna_qc_data_matrix))

## NULL

Step one - Generate a MultiAssayExperiment object

To run Omix, we first need to generate a multi-omics object The package currently supports transcriptomics and proteomics bulk data only.

# Convert metadata type from character to factor
individual_metadata$diagnosis <- factor(individual_metadata$diagnosis,
                                        levels = c("Control", "AD"))
individual_metadata$sex <- factor(individual_metadata$sex)

#Generate multiassay object
multiomics_object_MTG=generate_multiassay(rawdata_rna =rawdata_rna,
                                   rawdata_protein = rawdata_protein,
                                   individual_to_sample=FALSE,
                                   map_rna = map_rna,
                                   map_protein = map_prot,
                                   metadata_rna = metadata_rna,
                                   metadata_protein = metadata_prot,
                                   individual_metadata = individual_metadata,
                                   map_by_column = 'sample_id',
                                   rna_qc_data=FALSE,
                                   rna_qc_data_matrix=NULL,
                                   organism='human')

## ✔ Ensembl ID conversion to gene symbol

## ✔ Retrieval of gene biotype

## ✔ RNA raw data loaded

class: SummarizedExperiment dim: 58884 29 metadata(1): metadata assays(1): rna_raw rownames(58884): ENSG00000223972 ENSG00000227232 … ENSG00000277475 ENSG00000268674 rowData names(3): ensembl_gene_id gene_name gene_biotype colnames(29): IGF117756 IGF117759 … IGF117836 IGF117837 colData names(7): sample_id sample_name … pct_pTau_Positive PHF1

## ✔ Protein raw data loaded

class: SummarizedExperiment dim: 3228 39 metadata(1): metadata assays(1): protein_raw rownames(3228): IGKV2-28;IGKV2-29;IGKV2-30;IGKV2-40;IGKV2D-26;IGKV2D-28;IGKV2D-29;IGKV2D-30;IGKV2D-40 IGKV3-11;IGKV3D-11 … FAM169A SEC23IP rowData names(1): gene_name colnames(39): BBN_10099_MTG_P3WH10_001 BBN_10109_MTG_P4WA3_001 … BBN003.35446_MTG_P2WD1_001 BBN003.35520_MTG_P2WD4_001 colData names(3): sample_id sample_name batch

## ✔ MultiAssayExperiment object generated!

The MultiAssayExperiment object was succesfully created. The following steps will process and perform QC on each omic layers of the multiassay object.

Step two - Process transcriptomics data

multiomics_object_MTG=process_rna(multiassay=multiomics_object_MTG,
            transformation='rlog',
            protein_coding=TRUE,
            min_count = 10,
            min_sample = 0.5,
            dependent =  "diagnosis",
            levels = c("Control","AD"),
            covariates=c('age','sex','PMD'),
            filter=TRUE,
            batch_correction=TRUE,
            batch=NULL,
            remove_sample_outliers= TRUE)

## ✔ NORMALISATION & TRANSFORMATION

## ✔ GENE FILTERING

## ✔ Keeping only protein coding genes

## ✔ 39155 / 58884 non protein coding genes were dropped

## ✔ 19729 protein coding genes kept for analysis

## ✔ QC parameters selected require genes to have at least 50 % of samples with counts of 10 or more

## ✔ 4884 / 19729 genes were dropped

## ✔ 14845 genes kept for analysis

## ✔ RLOG TRANSFORMATION

## ✔ BATCH CORRECTION

## ✔ Using Limma on  to remove technical artefacts, and age as biological confoundersUsing Limma on  to remove technical artefacts, and sex as biological confoundersUsing Limma on  to remove technical artefacts, and PMD as biological confounders

## ✔ SAMPLE OUTLIERS REMOVAL

## ✔ 0 sample outliers out of 29  samples detected and dropped

## ✔ Transcriptomics data processed!

## ✔ Processing parameters saved in metadata

Step three - Process proteomics data

multiomics_object_MTG=process_protein(
    multiassay=multiomics_object_MTG,
    filter=TRUE,
    min_sample = 0.5,
    dependent =  "diagnosis",
    levels = c("Control","AD"),
    imputation = 'minimum_value',
    remove_feature_outliers= TRUE,
    batch_correction= TRUE,
    batch="batch",
    correction_method="median_centering",
    remove_sample_outliers=TRUE,
    denoise=TRUE,
    covariates=c('PMD','sex','age'))

## ✔ SCALING NORMALIZATION

## ✔ FILTERING

## ✔ 322 / 3228 proteins filtered

## ✔ 2906 proteins kept for analysis

## ✔ IMPUTATION

## ✔ Imputation of remaining missing values based on
## 50% of minimum level of abundance for each protein

## ✔ BATCH CORRECTION

## ✔ DENOISING BIOLOGICAL COVARIATES

## ✔ Using Limma on PMD as biological confoundersUsing Limma on sex as biological confoundersUsing Limma on age as biological confounders

## ✔ REMOVING FEATURE OUTLIERS

##

## ── Remove proteins with average protein abundance
## across samples in bottom and u

## ✔ 266 feature outliers out of 2906 features detected and dropped

## ✔ REMOVING SAMPLE OUTLIERS

## ✔ 0 sample outliers out of 39  samples detected and dropped

The processing parameter are stored in the object

Step Four - Single omic analyses

Transcriptomics differential expression using DEseq2

Now we run a differential expression analysis between groups of interest, where the first level, here control is the reference. The function requires to run process_rna first, with covariates of interest. These are adjusted for in the DE.

multiomics_object_MTG=rna_DE_analysis(multiassay=multiomics_object_MTG,
                    dependent='diagnosis',
                    levels=c('Control','AD'),
                    filter_protein_coding = TRUE,
                    log2FoldChange = 0.2)

## Welcome to enrichR
## Checking connection ...

## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!

## Uploading data to Enrichr... Done.
##   Querying GO_Molecular_Function_2021... Done.
##   Querying GO_Cellular_Component_2021... Done.
##   Querying GO_Biological_Process_2021... Done.
##   Querying WikiPathways_2021_Human... Done.
##   Querying Reactome_2016... Done.
##   Querying KEGG_2021_Human... Done.
##   Querying MSigDB_Hallmark_2020... Done.
##   Querying BioCarta_2016... Done.
## Parsing results... Done.

## No significant impacted pathways found at FDR <= 0.05!
## Welcome to enrichR
## Checking connection ... 
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!

## Uploading data to Enrichr... Done.
##   Querying GO_Molecular_Function_2021... Done.
##   Querying GO_Cellular_Component_2021... Done.
##   Querying GO_Biological_Process_2021... Done.
##   Querying WikiPathways_2021_Human... Done.
##   Querying Reactome_2016... Done.
##   Querying KEGG_2021_Human... Done.
##   Querying MSigDB_Hallmark_2020... Done.
##   Querying BioCarta_2016... Done.
## Parsing results... Done.

## ℹ Output is returned as a list!

Differential expression results for transcripts are saved in the multiomics_object_MTG@metadata$DEG slot

DT::datatable(multiomics_object_MTG@metadata$DEG$ADvsControl,
              rownames = FALSE, 
              escape = FALSE,
              options = list(pageLength = 20, 
                             scrollX=T, 
                             autoWidth = TRUE,
                             dom = 'Blfrtip'))

Volcano plot of differentially expressed genes is saved in the multiomics_object_MTG@metadata$DEG$plot slot

volcano_interactive(multiomics_object_MTG@metadata[["DEG"]]$ADvsControl, padj=0.05)

Proteomics differential expression using limma

Now we run a differential expression analysis between groups of interest, where the first level, here control is the reference. If the data is already denoised for covariates of interest in process_protein, set covariates as `NULL``

multiomics_object_MTG=protein_DE_analysis(multiomics_object_MTG,
                         slot="protein_processed",
                         dependent='diagnosis',
                         covariates=NULL,
                         levels=c('Control','AD'),
                         log2FoldChange = 0.2)

## Uploading data to Enrichr... Done.
##   Querying GO_Molecular_Function_2021... Done.
##   Querying GO_Cellular_Component_2021... Done.
##   Querying GO_Biological_Process_2021... Done.
##   Querying WikiPathways_2021_Human... Done.
##   Querying Reactome_2016... Done.
##   Querying KEGG_2021_Human... Done.
##   Querying MSigDB_Hallmark_2020... Done.
##   Querying BioCarta_2016... Done.
## Parsing results... Done.
## Uploading data to Enrichr... Done.
##   Querying GO_Molecular_Function_2021... Done.
##   Querying GO_Cellular_Component_2021... Done.
##   Querying GO_Biological_Process_2021... Done.
##   Querying WikiPathways_2021_Human... Done.
##   Querying Reactome_2016... Done.
##   Querying KEGG_2021_Human... Done.
##   Querying MSigDB_Hallmark_2020... Done.
##   Querying BioCarta_2016... Done.
## Parsing results... Done.

Differential expression results for proteins are saved in the multiomics_object_MTG@metadata$DEP slot

DT::datatable(multiomics_object_MTG@metadata$DEP$ADvsControl,
              rownames = FALSE, 
              escape = FALSE,
              options = list(pageLength = 20, 
                             scrollX=T, 
                             autoWidth = TRUE,
                             dom = 'Blfrtip'))

Volcano plot of differentially abundant proteins is saved in the multiomics_object_MTG@metadata$DEP$plot slot

volcano_interactive(multiomics_object_MTG@metadata[["DEP"]]$ADvsControl, padj=0.05)

The differential analysis of transcriptomics data yielded 12 and 56 up and downregulated genes respectively (absolute Log2FoldChange >0.1 and adjusted p-value < 0.05) (Figure 3A).
For proteomics data, 62 in the positive direction against 23 in the negative direction (Figure 3B).

Comparing differentially expressed genes and protein

Since the magnitude of changes are not directly comparable between platforms, but their directionality are, we assessed whether significant differential features overlapped between the two platforms and whether their direction of effects were concordant or discordant.

multiomics_object_MTG=single_omic_comparisons(multiassay=multiomics_object_MTG,
                                       threshold = 0.05,
                                       pvalue = 'pvalue',
                                       filtering_options=NULL,
                                       additional_database_gmt=GO_fp)

Differential analyses reveal partial concordance between transcriptomics and proteomics

comparison=multiomics_object_MTG@metadata$single_omic_comparison$ADvsControl
volcano_interactive_comparison(comparison$dataframe)

corr= stats::cor(comparison$dataframe$log2FoldChange_transcriptomics,
          comparison$dataframe$log2FoldChange_proteomics)

We found a r=0.2318711 correlation between pairwise log2FoldChanges (p-value <0.05) indicating a moderate positive correlation between the mRNA and protein level changes, which is reinforced by the existence of discordant pairs.

DT::datatable(comparison$dataframe,
              rownames = FALSE, 
              escape = FALSE,
              options = list(pageLength = 20, 
                             scrollX=T, 
                             autoWidth = TRUE,
                             dom = 'Blfrtip'))

Overall, these single platform analyses highlighted obvious differences between transcriptomics and proteomics and should be further studied to differentiate technical versus biological effects. Overall, results highlight that:

different platforms do not cover the same exact set of features
of the features that intersect, most see a significant change in one platform only
of the features that significantly change in both platforms, some may see discordant directionality of effects

Protein accumulation

Protein with a positive Log2FC but negative mRNA regulation may point to those accumulating in the brain as AD progresses

comparison$dataframe$gene_name[which(comparison$dataframe$direction=='Discordant'& comparison$dataframe$log2FoldChange_proteomics>0.2)]

##   [1] "ACAN"     "ACAT2"    "ACOT8"    "ACSL6"    "ADAM10"   "AKR1A1"  
##   [7] "AKR7A2"   "ALDH9A1"  "ALMS1"    "ANK1"     "ANP32A"   "AP3M1"   
##  [13] "APOE"     "APRT"     "ARCN1"    "ARF1"     "ARF4"     "ARL1"    
##  [19] "ARL3"     "ARMC8"    "ASS1"     "ATP1B2"   "ATP2B3"   "ATP2B4"  
##  [25] "ATP8A1"   "ATP9A"    "BAG5"     "BANF1"    "BCL2L13"  "BLOC1S2" 
##  [31] "BLVRB"    "C1QA"     "C1QC"     "CACNA2D2" "CALR"     "CAMK2G"  
##  [37] "CAMSAP2"  "CAPNS1"   "CBX3"     "CCDC43"   "CCDC91"   "CCS"     
##  [43] "CHMP2B"   "CHMP7"    "CHORDC1"  "CLIC4"    "CNN3"     "COMMD10" 
##  [49] "COMMD2"   "COPB1"    "COPG1"    "COPS6"    "CPE"      "CPNE3"   
##  [55] "CPNE6"    "DDOST"    "DHX15"    "DHX9"     "DMD"      "DNPEP"   
##  [61] "DRG1"     "DYNC1I2"  "EIF1"     "EIF2B3"   "EIF2B4"   "EIF3D"   
##  [67] "EIF3E"    "EIF4A3"   "EMD"      "EPM2AIP1" "ESD"      "FBXL16"  
##  [73] "FDPS"     "FHL1"     "FKBP4"    "FN3KRP"   "FNTA"     "FSD1L"   
##  [79] "GAK"      "GBA"      "GCA"      "GLO1"     "GNAQ"     "GNAS"    
##  [85] "GSPT1"    "GSTM5"    "GYS1"     "HBS1L"    "HMGB1"    "HNRNPK"  
##  [91] "HOOK3"    "HSDL1"    "HSPB6"    "IFT27"    "INPP1"    "INPP5A"  
##  [97] "ISOC1"    "ITPR1"    "KCNA2"    "LAMTOR2"  "LCP1"     "LGALS3"  
## [103] "LIMS1"    "LIN7C"    "LMAN1"    "LSM3"     "LTA4H"    "LUC7L2"  
## [109] "LYSMD2"   "MAOB"     "MAPK1"    "MARCKS"   "MPC2"     "MTCH1"   
## [115] "MTCH2"    "NAMPT"    "NANS"     "NDRG4"    "NEFH"     "NF1"     
## [121] "NLGN2"    "NOP56"    "NPTX1"    "NQO1"     "NT5C1A"   "OTUD6B"  
## [127] "PAFAH1B1" "PCYOX1"   "PELO"     "PGD"      "PHPT1"    "PIK3R4"  
## [133] "PITRM1"   "PLAA"     "PLCD1"    "PLCXD3"   "PLD3"     "PPIL1"   
## [139] "PPT1"     "PRAF2"    "PREP"     "PREPL"    "PRPSAP1"  "PSMC4"   
## [145] "PSMD10"   "PSME2"    "RAB39B"   "RALYL"    "RANBP9"   "RANGAP1" 
## [151] "RBBP9"    "RBP1"     "RCN1"     "REPS1"    "RMDN3"    "RNPS1"   
## [157] "RPN1"     "RSU1"     "RYR2"     "S100A1"   "S100A16"  "SACM1L"  
## [163] "SAE1"     "SART3"    "SEC13"    "SEC22B"   "SEC24C"   "SERPINH1"
## [169] "SETD7"    "SIRT3"    "SKP1"     "SLC1A3"   "SLC2A3"   "SLC3A2"  
## [175] "SLC4A1"   "SLC4A4"   "SLC7A14"  "SLC8A1"   "SMS"      "SNRNP200"
## [181] "SNRNP40"  "SNRPA1"   "SNTB2"    "SON"      "SQSTM1"   "SRI"     
## [187] "SRP54"    "SRP72"    "SRSF1"    "SRSF2"    "SRSF7"    "SSB"     
## [193] "STAM2"    "SURF4"    "SYP"      "TBCB"     "TMA7"     "TMEM11"  
## [199] "TMEM30A"  "TMEM65"   "TMX3"     "TOP1"     "TPP1"     "TRA2A"   
## [205] "TRAPPC10" "TSN"      "TSNAX"    "TSR2"     "TXNRD1"   "UBA1"    
## [211] "UBAC1"    "UFC1"     "UFL1"     "VCL"      "VPS18"    "VPS25"   
## [217] "VPS29"    "VTN"      "WDR1"

Integrative pathway enrichment (Via Active Pathways)

background=get_background(multiomics_object_MTG)
scores_group=comparison$dataframe[,c('gene_name','pvalue_transcriptomics','pvalue_proteomics')]
scores_group=scores_group[!duplicated(scores_group$gene_name), ]
rownames(scores_group)=scores_group$gene_name
scores_group=scores_group[,2:3]
colnames(scores_group)=c('Transcriptomics','Proteomics')
scores_group=as.matrix(scores_group)
ActivePathways=ActivePathways::ActivePathways(scores_group,
                                              gmt=GO_fp,
                                              background = background,
                                              cutoff=0.2)

DT::datatable(ActivePathways,
              rownames = FALSE, 
              escape = FALSE,
              options = list(pageLength = 20, 
                             scrollX=T, 
                             autoWidth = TRUE,
                             dom = 'Blfrtip'))

Integrative pathway enrichment (Via multiGSEA)

multiGSEA works with nested lists where each sublist represents an omics layer. The function rankFeatures calculates the pre-ranked features, that are needed for the subsequent calculation of the enrichment score. rankFeatures calculates the a local statistic ls based on the direction of the fold change and the magnitude of its significance

DT::datatable(multiomics_object_MTG@metadata$single_omic_comparison$ADvsControl$integrated_enrichment,
              rownames = FALSE, 
              escape = FALSE,
              options = list(pageLength = 20, 
                             scrollX=T, 
                             autoWidth = TRUE,
                             dom = 'Blfrtip'))

Step Five - Multi-omics integration using DIABLO

By default, the vertical integration is performed on processed slots.

DIABLO is a supervised multi-omics integration model that requires the input of a dependent variable. The function will perform tuning on the defined range, and number of components chosen.

multiomics_object_MTG=vertical_integration(multiomics_object_MTG,
                                  slots = c(
                                   "rna_processed",
                                   "protein_processed"
                                 ),
                                integration='DIABLO',
                                intersect_genes=FALSE,
                                ID_type='gene_name',
                                dependent='diagnosis',
                                levels= c("Control", "AD"),
                                design="cor",
                                ncomp=2,
                                range=list(mRNA=seq(100,1000,by=100),
                                              proteins=seq(20,1000,by=100)))

##

## ── MODEL TUNING ──

##

## Design matrix has changed to include Y; each block will be
##             linked to Y.

## 
## You have provided a sequence of keepX of length: 10 for block mRNA and 10 for block proteins.
## This results in 100 models being fitted for each component and each nrepeat, this may take some time to run, be patient!

## 
## You can look into the 'BPPARAM' argument to speed up computation time.

## 
## tuning component 1
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |======================================================================| 100%
## tuning component 2
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |======================================================================| 100%

## Design matrix has changed to include Y; each block will be
##             linked to Y.

## ✔ VERTICAL INTEGRATION WITH DIABLO

Step Six - Multi-omics analyses (Supervised Example using DIABLO)

The user may want to perform the integrative downstream analyses step-by-step, or to run integrative_results_supervised, which will create a results object containing, multi-omics signatures, network visualisation, cell type and functional enrichment, as well as comparison of signatures against the OpenTargets knowledge base.

multiomics_object_MTG=integrative_results_supervised(multiassay=multiomics_object_MTG,
                                          integration_model = "DIABLO",
                                          dependent = "diagnosis",
                                          component=1,
                                          sense_check_variable = "PHF1",
                                          covariates = c("PHF1", "amyloid", "pTau","AD","Braak"),
                                          correlation_threshold = 0.4,
                                          weights_threshold = 0.5,
                                          community_detection = "leading_eigen",
                                          disease_id = "MONDO_0004975",
                                          TF_fp=TF_fp,
                                          ctd=ctd)

Step seven - Generate report

#multi_omic_report(multiassay=multiomics_object_MTG,
#                               report_folder_path = '~/omix_new/Reports',
#                               report_file = "multi_omic_report_Omix",
#                               integration_model='DIABLO',
#                               database='Reactome_2016')

Omix - Get started

This vignette was created by Eléonore Schneegans and Nurun Fancy

10 April, 2024