Omix_vignette_CV_MTG.Rmd
The goal of Omix is to provide tools in R to build a complete analysis workflow for integrative analysis of data generated from multi-omics platform.
Generate a multi-omics object using
MultiAssayExperiment
.
Quality control of single-omics data.
Formatting, normalisation, denoising of single-omics data.
Separate single-omics analyses.
Integration of multi-omics data for combined analysis.
Publication quality plots and interactive analysis reports based of shinyApp.
Currently, Omix supports the integration of bulk transcriptomics and bulk proteomics.
The Omix
pipeline requires the following input:
rawdata_rna
: A data-frame containing raw RNA count
with rownames as gene and colnames as samples.
rawdata_protein
: A data-frame containing protein
abundance with rownames as proteins and colnames as samples.
map_rna
: A data-frame of two columns named
primary
and colname
where primary should
contain unique sample name with a link to sample metadata and colname is
the column names of the rawdata_rna
data-frame.
map_protein
: A data-frame of two columns named
primary
and colname
where primary should
contain unique sample name with a link to sample metadata and colname is
the column names of the rawdata_protein
data-frame.
metadata_rna
: A data-frame containing
rna
assay specific metadata where rownames are same as the
colnames of rawdata_rna
data.frame.
metadata_protein
: A data-frame containing
protein
assay specific metadata where rownames are same as
the colnames of rawdata_protein
data.frame.
individual_metadata
: A data-frame containing
individual level metadata for both omics assays.
Call required packages for vignette:
library(Omix)
library(purrr)
library(ggplot2)
library(dplyr)
library(tibble)
library(stats)
library(synapser)
First we must load the data from Omix for the vignette:
outputDir <- tempdir()
ctd_fp <-file.path(outputDir, "ctd.rds")
ensembl_fp <- file.path(outputDir, "ensembl.rds")
GO_fp <- file.path(outputDir, "GO_Biological_Process_2021.txt")
TF_fp <- file.path(outputDir, "Enrichr_Queries.gmt.txt")
download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/GO_Biological_Process_2021.txt",destfile=GO_fp)
download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/ctd-2.rds",destfile=ctd_fp)
download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/ensembl_mappings_human.tsv",destfile=ensembl_fp)
download.file(url = "https://raw.githubusercontent.com/eleonore-schneeg/OmixData/main/Enrichr_Queries.gmt.txt",destfile=TF_fp)
synapser::synLogin(email=Sys.getenv("SYNAPSE_ID"), authToken=Sys.getenv("AUTHTOKEN"))
Welcome, ems2817!NULL
rawdata_rna <- read.csv(synapser::synGet('syn51533734')$path, header=T, stringsAsFactors = F, row.names=1)
rawdata_protein <- read.csv(synapser::synGet('syn51533735')$path, header=T, stringsAsFactors = F, row.names=1)
map_rna <-read.csv(synapser::synGet('syn51533731')$path, header=T, stringsAsFactors = F, row.names=1)
map_prot <- read.csv(synapser::synGet('syn51533730')$path, header=T, stringsAsFactors = F, row.names=1)
metadata_rna<-read.csv(synapser::synGet('syn51533733')$path, header=T, stringsAsFactors = F, row.names=1)
metadata_prot <- read.csv(synapser::synGet('syn51533732')$path, header=T, stringsAsFactors = F, row.names=1)
individual_metadata <-read.csv(synapser::synGet('syn51533736')$path, header=T, stringsAsFactors = F, row.names=1)
ctd <- readRDS(paste0(outputDir,'/ctd.rds'))
ensembl <-read.delim(file=paste0(outputDir,'/ensembl.rds'), sep = '\t', header = TRUE)
rna_qc_data_matrix <- NULL
Sanity check
## [1] TRUE
## [1] TRUE
rawdata_rna
is a data-frame of raw counts, with features
as rows and samples as columns
print(rawdata_rna[1:5,1:5])
## IGF117756 IGF117759 IGF117764 IGF117772 IGF117778
## ENSG00000223972 1 1 1 0 0
## ENSG00000227232 40 30 18 15 35
## ENSG00000278267 1 0 3 4 3
## ENSG00000243485 0 0 0 0 0
## ENSG00000284332 0 0 0 0 0
rawdata_protein
is a data-frame of raw protein
abundances, with features as rows and samples as columns
print(rawdata_protein[15:20,15:20])
## BBN_7540_MTG_P1WG4_001 BBN_7545_MTG_P1WG7_001
## TUBAL3 7158.6100 9201.4900
## FBLL1 67.4334 60.1956
## CASTOR2 17.3159 26.2499
## UNC119;UNC119B 20.3847 12.3418
## YJEFN3 NA 23.4309
## ARHGAP32 11.9989 17.0831
## BBN_7617_MTG_P1WH4_001 BBN_9984_MTG_P3WD1_001
## TUBAL3 7294.23000 8982.00000
## FBLL1 85.75520 62.94880
## CASTOR2 15.63710 19.52160
## UNC119;UNC119B 19.64770 8.60717
## YJEFN3 NA 25.40000
## ARHGAP32 5.57716 9.02670
## BBN002.26094_MTG_P3WC8_001 BBN002.29089_MTG_P3WD7_001
## TUBAL3 10871.4000 8487.21000
## FBLL1 46.9068 57.40500
## CASTOR2 NA 22.37250
## UNC119;UNC119B NA 17.10210
## YJEFN3 19.7786 16.63510
## ARHGAP32 NA 7.95878
Maps are data frame containing two columns: primary
and
colname
. The primary
column should be the
individual ID the individual metadata, and the colname
the
matched sample names from the raw matrices columns. If sample and
individual ids are the same, maps aren’t needed (primary and colnames
are the same).
## primary colname
## 1 BBN_10099-MTG BBN_10099_MTG_P3WH10_001
## 2 BBN_10109-MTG BBN_10109_MTG_P4WA3_001
## 3 BBN_10611-MTG BBN_10611_MTG_P2WD10_001
## 4 BBN_14801-MTG BBN_14801_MTG_P1WH9_001
## 5 BBN_16213-MTG BBN_16213_MTG_P2WH3_001
## 6 BBN_18399-MTG BBN_18399_MTG_P2WF5_001
## primary colname
## 12 BBN003.32235-MTG IGF117756
## 15 BBN_7617-MTG IGF117759
## 20 BBN_16213-MTG IGF117764
## 25 BBN_7519-MTG IGF117772
## 30 BBN_7540-MTG IGF117778
## 36 BBN003.32227-MTG IGF117784
Technical metadata are data-frames contain the column
colname
that should match the sample names in the raw
matrices, and any additional columns related to technical artefacts like
batches
## sample_id sample_name RIN Amyloid_Beta pTau
## IGF117756 BBN003.32235-MTG IGF117756 2.7 2.027670 6.98931000
## IGF117759 BBN_7617-MTG IGF117759 2.0 3.262270 2.55134000
## IGF117764 BBN_16213-MTG IGF117764 4.6 0.147259 0.00247153
## IGF117772 BBN_7519-MTG IGF117772 7.5 NA NA
## IGF117778 BBN_7540-MTG IGF117778 4.8 3.828680 2.77941000
## IGF117784 BBN003.32227-MTG IGF117784 1.7 5.015050 2.94966000
## pct_pTau_Positive PHF1
## IGF117756 5.25975000 2.420650000
## IGF117759 0.37497600 1.564080000
## IGF117764 0.00876194 0.000519246
## IGF117772 NA NA
## IGF117778 2.56623000 NA
## IGF117784 0.71418700 2.409610000
## sample_id sample_name batch
## BBN_10099_MTG_P3WH10_001 BBN_10099-MTG BBN_10099_MTG_P3WH10_001 P3
## BBN_10109_MTG_P4WA3_001 BBN_10109-MTG BBN_10109_MTG_P4WA3_001 P4
## BBN_10611_MTG_P2WD10_001 BBN_10611-MTG BBN_10611_MTG_P2WD10_001 P2
## BBN_14801_MTG_P1WH9_001 BBN_14801-MTG BBN_14801_MTG_P1WH9_001 P1
## BBN_16213_MTG_P2WH3_001 BBN_16213-MTG BBN_16213_MTG_P2WH3_001 P2
## BBN_18399_MTG_P2WF5_001 BBN_18399-MTG BBN_18399_MTG_P2WF5_001 P2
individual_metadata
should contain individual level
metadata, where one column matches the primary
column in
maps.
## sample_id brain_region age trem2_all BBN_ID sex diagnosis apoe_final
## 1 BBN_10099-MTG MTG 91 CV BBN_10099 F Control E3/E3
## 2 BBN_10109-MTG MTG 80 CV BBN_10109 F Control E2/E3
## 3 BBN_10611-MTG MTG 92 CV BBN_10611 F AD E3/E3
## 4 BBN_14801-MTG MTG 65 CV BBN_14801 F Control E3/E3
## 5 BBN_16213-MTG MTG 73 CV BBN_16213 M Control E3/E3
## 6 BBN_18399-MTG MTG 77 CV BBN_18399 M Control E3/E3
## Braak PMD amyloid pTau PHF1 AD
## 1 2 22 0.5852850 0.00649899 0.001686760 0
## 2 2 23 0.7826840 0.92852500 0.678769000 0
## 3 3 30 0.3078820 0.01312750 0.017976000 1
## 4 1 47 NA NA NA 0
## 5 0 23 0.1472590 0.00247153 0.000519246 0
## 6 0 11 0.0455756 0.00222531 0.010463800 0
To run Omix, we first need to generate a multi-omics object The package currently supports transcriptomics and proteomics bulk data only.
# Convert metadata type from character to factor
individual_metadata$diagnosis <- factor(individual_metadata$diagnosis,
levels = c("Control", "AD"))
individual_metadata$sex <- factor(individual_metadata$sex)
#Generate multiassay object
multiomics_object_MTG=generate_multiassay(rawdata_rna =rawdata_rna,
rawdata_protein = rawdata_protein,
individual_to_sample=FALSE,
map_rna = map_rna,
map_protein = map_prot,
metadata_rna = metadata_rna,
metadata_protein = metadata_prot,
individual_metadata = individual_metadata,
map_by_column = 'sample_id',
rna_qc_data=FALSE,
rna_qc_data_matrix=NULL,
organism='human')
## ✔ Ensembl ID conversion to gene symbol
## ✔ Retrieval of gene biotype
## ✔ RNA raw data loaded
class: SummarizedExperiment dim: 58884 29 metadata(1): metadata assays(1): rna_raw rownames(58884): ENSG00000223972 ENSG00000227232 … ENSG00000277475 ENSG00000268674 rowData names(3): ensembl_gene_id gene_name gene_biotype colnames(29): IGF117756 IGF117759 … IGF117836 IGF117837 colData names(7): sample_id sample_name … pct_pTau_Positive PHF1
## ✔ Protein raw data loaded
class: SummarizedExperiment dim: 3228 39 metadata(1): metadata assays(1): protein_raw rownames(3228): IGKV2-28;IGKV2-29;IGKV2-30;IGKV2-40;IGKV2D-26;IGKV2D-28;IGKV2D-29;IGKV2D-30;IGKV2D-40 IGKV3-11;IGKV3D-11 … FAM169A SEC23IP rowData names(1): gene_name colnames(39): BBN_10099_MTG_P3WH10_001 BBN_10109_MTG_P4WA3_001 … BBN003.35446_MTG_P2WD1_001 BBN003.35520_MTG_P2WD4_001 colData names(3): sample_id sample_name batch
## ✔ MultiAssayExperiment object generated!
The MultiAssayExperiment object was succesfully created. The
following steps will process and perform QC on each omic layers of the
multiassay
object.
multiomics_object_MTG=process_rna(multiassay=multiomics_object_MTG,
transformation='rlog',
protein_coding=TRUE,
min_count = 10,
min_sample = 0.5,
dependent = "diagnosis",
levels = c("Control","AD"),
covariates=c('age','sex','PMD'),
filter=TRUE,
batch_correction=TRUE,
batch=NULL,
remove_sample_outliers= TRUE)
## ✔ NORMALISATION & TRANSFORMATION
## ✔ GENE FILTERING
## ✔ Keeping only protein coding genes
## ✔ 39155 / 58884 non protein coding genes were dropped
## ✔ 19729 protein coding genes kept for analysis
## ✔ QC parameters selected require genes to have at least 50 % of samples with counts of 10 or more
## ✔ 4884 / 19729 genes were dropped
## ✔ 14845 genes kept for analysis
## ✔ RLOG TRANSFORMATION
## ✔ BATCH CORRECTION
## ✔ Using Limma on to remove technical artefacts, and age as biological confoundersUsing Limma on to remove technical artefacts, and sex as biological confoundersUsing Limma on to remove technical artefacts, and PMD as biological confounders
## ✔ SAMPLE OUTLIERS REMOVAL
## ✔ 0 sample outliers out of 29 samples detected and dropped
## ✔ Transcriptomics data processed!
## ✔ Processing parameters saved in metadata
multiomics_object_MTG=process_protein(
multiassay=multiomics_object_MTG,
filter=TRUE,
min_sample = 0.5,
dependent = "diagnosis",
levels = c("Control","AD"),
imputation = 'minimum_value',
remove_feature_outliers= TRUE,
batch_correction= TRUE,
batch="batch",
correction_method="median_centering",
remove_sample_outliers=TRUE,
denoise=TRUE,
covariates=c('PMD','sex','age'))
## ✔ SCALING NORMALIZATION
## ✔ FILTERING
## ✔ 322 / 3228 proteins filtered
## ✔ 2906 proteins kept for analysis
## ✔ IMPUTATION
## ✔ Imputation of remaining missing values based on
## 50% of minimum level of abundance for each protein
## ✔ BATCH CORRECTION
## ✔ DENOISING BIOLOGICAL COVARIATES
## ✔ Using Limma on PMD as biological confoundersUsing Limma on sex as biological confoundersUsing Limma on age as biological confounders
## ✔ REMOVING FEATURE OUTLIERS
##
## ── Remove proteins with average protein abundance
## across samples in bottom and u
## ✔ 266 feature outliers out of 2906 features detected and dropped
## ✔ REMOVING SAMPLE OUTLIERS
## ✔ 0 sample outliers out of 39 samples detected and dropped
The processing parameter are stored in the object
Now we run a differential expression analysis between groups of
interest, where the first level, here control is the reference. The
function requires to run process_rna
first, with covariates
of interest. These are adjusted for in the DE.
multiomics_object_MTG=rna_DE_analysis(multiassay=multiomics_object_MTG,
dependent='diagnosis',
levels=c('Control','AD'),
filter_protein_coding = TRUE,
log2FoldChange = 0.2)
## Welcome to enrichR
## Checking connection ...
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
## No significant impacted pathways found at FDR <= 0.05!
## Welcome to enrichR
## Checking connection ...
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is available!
## WormEnrichr ... Connection is available!
## YeastEnrichr ... Connection is available!
## FishEnrichr ... Connection is available!
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
## ℹ Output is returned as a list!
Differential expression results for transcripts are saved in the
multiomics_object_MTG@metadata$DEG
slot
DT::datatable(multiomics_object_MTG@metadata$DEG$ADvsControl,
rownames = FALSE,
escape = FALSE,
options = list(pageLength = 20,
scrollX=T,
autoWidth = TRUE,
dom = 'Blfrtip'))
Volcano plot of differentially expressed genes is saved in the
multiomics_object_MTG@metadata$DEG$plot
slot
volcano_interactive(multiomics_object_MTG@metadata[["DEG"]]$ADvsControl, padj=0.05)
Now we run a differential expression analysis between groups of
interest, where the first level, here control is the reference. If the
data is already denoised for covariates of interest in
process_protein
, set covariates as `NULL``
multiomics_object_MTG=protein_DE_analysis(multiomics_object_MTG,
slot="protein_processed",
dependent='diagnosis',
covariates=NULL,
levels=c('Control','AD'),
log2FoldChange = 0.2)
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
## Uploading data to Enrichr... Done.
## Querying GO_Molecular_Function_2021... Done.
## Querying GO_Cellular_Component_2021... Done.
## Querying GO_Biological_Process_2021... Done.
## Querying WikiPathways_2021_Human... Done.
## Querying Reactome_2016... Done.
## Querying KEGG_2021_Human... Done.
## Querying MSigDB_Hallmark_2020... Done.
## Querying BioCarta_2016... Done.
## Parsing results... Done.
Differential expression results for proteins are saved in the
multiomics_object_MTG@metadata$DEP
slot
DT::datatable(multiomics_object_MTG@metadata$DEP$ADvsControl,
rownames = FALSE,
escape = FALSE,
options = list(pageLength = 20,
scrollX=T,
autoWidth = TRUE,
dom = 'Blfrtip'))
Volcano plot of differentially abundant proteins is saved in the
multiomics_object_MTG@metadata$DEP$plot
slot
volcano_interactive(multiomics_object_MTG@metadata[["DEP"]]$ADvsControl, padj=0.05)
Since the magnitude of changes are not directly comparable between platforms, but their directionality are, we assessed whether significant differential features overlapped between the two platforms and whether their direction of effects were concordant or discordant.
multiomics_object_MTG=single_omic_comparisons(multiassay=multiomics_object_MTG,
threshold = 0.05,
pvalue = 'pvalue',
filtering_options=NULL,
additional_database_gmt=GO_fp)
comparison=multiomics_object_MTG@metadata$single_omic_comparison$ADvsControl
volcano_interactive_comparison(comparison$dataframe)
corr= stats::cor(comparison$dataframe$log2FoldChange_transcriptomics,
comparison$dataframe$log2FoldChange_proteomics)
We found a r=0.2318711 correlation between pairwise log2FoldChanges (p-value <0.05) indicating a moderate positive correlation between the mRNA and protein level changes, which is reinforced by the existence of discordant pairs.
DT::datatable(comparison$dataframe,
rownames = FALSE,
escape = FALSE,
options = list(pageLength = 20,
scrollX=T,
autoWidth = TRUE,
dom = 'Blfrtip'))
Overall, these single platform analyses highlighted obvious differences between transcriptomics and proteomics and should be further studied to differentiate technical versus biological effects. Overall, results highlight that:
Protein with a positive Log2FC but negative mRNA regulation may point to those accumulating in the brain as AD progresses
comparison$dataframe$gene_name[which(comparison$dataframe$direction=='Discordant'& comparison$dataframe$log2FoldChange_proteomics>0.2)]
## [1] "ACAN" "ACAT2" "ACOT8" "ACSL6" "ADAM10" "AKR1A1"
## [7] "AKR7A2" "ALDH9A1" "ALMS1" "ANK1" "ANP32A" "AP3M1"
## [13] "APOE" "APRT" "ARCN1" "ARF1" "ARF4" "ARL1"
## [19] "ARL3" "ARMC8" "ASS1" "ATP1B2" "ATP2B3" "ATP2B4"
## [25] "ATP8A1" "ATP9A" "BAG5" "BANF1" "BCL2L13" "BLOC1S2"
## [31] "BLVRB" "C1QA" "C1QC" "CACNA2D2" "CALR" "CAMK2G"
## [37] "CAMSAP2" "CAPNS1" "CBX3" "CCDC43" "CCDC91" "CCS"
## [43] "CHMP2B" "CHMP7" "CHORDC1" "CLIC4" "CNN3" "COMMD10"
## [49] "COMMD2" "COPB1" "COPG1" "COPS6" "CPE" "CPNE3"
## [55] "CPNE6" "DDOST" "DHX15" "DHX9" "DMD" "DNPEP"
## [61] "DRG1" "DYNC1I2" "EIF1" "EIF2B3" "EIF2B4" "EIF3D"
## [67] "EIF3E" "EIF4A3" "EMD" "EPM2AIP1" "ESD" "FBXL16"
## [73] "FDPS" "FHL1" "FKBP4" "FN3KRP" "FNTA" "FSD1L"
## [79] "GAK" "GBA" "GCA" "GLO1" "GNAQ" "GNAS"
## [85] "GSPT1" "GSTM5" "GYS1" "HBS1L" "HMGB1" "HNRNPK"
## [91] "HOOK3" "HSDL1" "HSPB6" "IFT27" "INPP1" "INPP5A"
## [97] "ISOC1" "ITPR1" "KCNA2" "LAMTOR2" "LCP1" "LGALS3"
## [103] "LIMS1" "LIN7C" "LMAN1" "LSM3" "LTA4H" "LUC7L2"
## [109] "LYSMD2" "MAOB" "MAPK1" "MARCKS" "MPC2" "MTCH1"
## [115] "MTCH2" "NAMPT" "NANS" "NDRG4" "NEFH" "NF1"
## [121] "NLGN2" "NOP56" "NPTX1" "NQO1" "NT5C1A" "OTUD6B"
## [127] "PAFAH1B1" "PCYOX1" "PELO" "PGD" "PHPT1" "PIK3R4"
## [133] "PITRM1" "PLAA" "PLCD1" "PLCXD3" "PLD3" "PPIL1"
## [139] "PPT1" "PRAF2" "PREP" "PREPL" "PRPSAP1" "PSMC4"
## [145] "PSMD10" "PSME2" "RAB39B" "RALYL" "RANBP9" "RANGAP1"
## [151] "RBBP9" "RBP1" "RCN1" "REPS1" "RMDN3" "RNPS1"
## [157] "RPN1" "RSU1" "RYR2" "S100A1" "S100A16" "SACM1L"
## [163] "SAE1" "SART3" "SEC13" "SEC22B" "SEC24C" "SERPINH1"
## [169] "SETD7" "SIRT3" "SKP1" "SLC1A3" "SLC2A3" "SLC3A2"
## [175] "SLC4A1" "SLC4A4" "SLC7A14" "SLC8A1" "SMS" "SNRNP200"
## [181] "SNRNP40" "SNRPA1" "SNTB2" "SON" "SQSTM1" "SRI"
## [187] "SRP54" "SRP72" "SRSF1" "SRSF2" "SRSF7" "SSB"
## [193] "STAM2" "SURF4" "SYP" "TBCB" "TMA7" "TMEM11"
## [199] "TMEM30A" "TMEM65" "TMX3" "TOP1" "TPP1" "TRA2A"
## [205] "TRAPPC10" "TSN" "TSNAX" "TSR2" "TXNRD1" "UBA1"
## [211] "UBAC1" "UFC1" "UFL1" "VCL" "VPS18" "VPS25"
## [217] "VPS29" "VTN" "WDR1"
background=get_background(multiomics_object_MTG)
scores_group=comparison$dataframe[,c('gene_name','pvalue_transcriptomics','pvalue_proteomics')]
scores_group=scores_group[!duplicated(scores_group$gene_name), ]
rownames(scores_group)=scores_group$gene_name
scores_group=scores_group[,2:3]
colnames(scores_group)=c('Transcriptomics','Proteomics')
scores_group=as.matrix(scores_group)
ActivePathways=ActivePathways::ActivePathways(scores_group,
gmt=GO_fp,
background = background,
cutoff=0.2)
DT::datatable(ActivePathways,
rownames = FALSE,
escape = FALSE,
options = list(pageLength = 20,
scrollX=T,
autoWidth = TRUE,
dom = 'Blfrtip'))
multiGSEA works with nested lists where each sublist represents an omics layer. The function rankFeatures calculates the pre-ranked features, that are needed for the subsequent calculation of the enrichment score. rankFeatures calculates the a local statistic ls based on the direction of the fold change and the magnitude of its significance
By default, the vertical integration is performed on processed slots.
DIABLO
is a supervised multi-omics integration model
that requires the input of a dependent variable. The function will
perform tuning on the defined range, and number of components
chosen.
multiomics_object_MTG=vertical_integration(multiomics_object_MTG,
slots = c(
"rna_processed",
"protein_processed"
),
integration='DIABLO',
intersect_genes=FALSE,
ID_type='gene_name',
dependent='diagnosis',
levels= c("Control", "AD"),
design="cor",
ncomp=2,
range=list(mRNA=seq(100,1000,by=100),
proteins=seq(20,1000,by=100)))
##
## ── MODEL TUNING ──
##
## Design matrix has changed to include Y; each block will be
## linked to Y.
##
## You have provided a sequence of keepX of length: 10 for block mRNA and 10 for block proteins.
## This results in 100 models being fitted for each component and each nrepeat, this may take some time to run, be patient!
##
## You can look into the 'BPPARAM' argument to speed up computation time.
##
## tuning component 1
##
|
| | 0%
|
|============== | 20%
|
|============================ | 40%
|
|========================================== | 60%
|
|======================================================== | 80%
|
|======================================================================| 100%
## tuning component 2
##
|
| | 0%
|
|============== | 20%
|
|============================ | 40%
|
|========================================== | 60%
|
|======================================================== | 80%
|
|======================================================================| 100%
## Design matrix has changed to include Y; each block will be
## linked to Y.
## ✔ VERTICAL INTEGRATION WITH DIABLO
The user may want to perform the integrative downstream analyses
step-by-step, or to run integrative_results_supervised
,
which will create a results object containing, multi-omics signatures,
network visualisation, cell type and functional enrichment, as well as
comparison of signatures against the OpenTargets knowledge base.
multiomics_object_MTG=integrative_results_supervised(multiassay=multiomics_object_MTG,
integration_model = "DIABLO",
dependent = "diagnosis",
component=1,
sense_check_variable = "PHF1",
covariates = c("PHF1", "amyloid", "pTau","AD","Braak"),
correlation_threshold = 0.4,
weights_threshold = 0.5,
community_detection = "leading_eigen",
disease_id = "MONDO_0004975",
TF_fp=TF_fp,
ctd=ctd)