Having identified differentially expressed genes, we now ask what they mean biologically. Pathway analysis uses over-representation testing to ask: are our DE genes enriched in particular biological processes or pathways compared to what we would expect by chance?
Requires: de_results_all.csv produced
by 03_differential_expression.Rmd
library(clusterProfiler)
library(org.Mm.eg.db) # Mouse annotation — use org.Hs.eg.db for human
library(enrichplot)
library(tidyverse)
library(patchwork)
library(here)
set.seed(42)
results_dir <- here("demo-analysis", "output", "differential-expression", "results")
figures_dir <- here("demo-analysis", "output", "differential-expression", "figures")
dir.create(results_dir, showWarnings = FALSE, recursive = TRUE)
dir.create(figures_dir, showWarnings = FALSE, recursive = TRUE)
all_results <- read.csv(here(results_dir, "de_results_all.csv"))
rownames(all_results) <- all_results$gene
sig_genes <- all_results %>% filter(adj.P.Val < 0.05 & abs(logFC) > 1)
up_genes <- sig_genes %>% filter(logFC > 1)
down_genes <- sig_genes %>% filter(logFC < 1)
cat("Total DE genes (FDR < 0.05) & FC > 2x:", nrow(sig_genes), "\n")
## Total DE genes (FDR < 0.05) & FC > 2x: 2214
cat("Upregulated in KD: ", nrow(up_genes), "\n")
## Upregulated in KD: 1435
cat("Downregulated in KD: ", nrow(down_genes), "\n")
## Downregulated in KD: 779
Most pathway databases use Entrez Gene IDs rather than gene symbols. We convert here and also define a background gene set — all genes that were actually tested. Using the correct background is important for accurate enrichment statistics: comparing to all genes in the genome rather than genes in your experiment inflates false positives.
A conversion rate of 85–95% is normal. Genes that fail to convert are typically novel genes, pseudogenes, or non-coding RNAs not yet in the annotation database.
# Convert DE gene symbols to Entrez IDs
gene_entrez <- bitr(rownames(sig_genes),
fromType = "SYMBOL",
toType = "ENTREZID",
OrgDb = org.Mm.eg.db)
# Background: all genes tested
background_entrez <- bitr(rownames(all_results),
fromType = "SYMBOL",
toType = "ENTREZID",
OrgDb = org.Mm.eg.db)
universe <- background_entrez$ENTREZID
cat("DE genes submitted: ", nrow(sig_genes), "\n")
## DE genes submitted: 2214
cat("Successfully converted: ", nrow(gene_entrez), "\n")
## Successfully converted: 1928
cat("Background genes: ", length(universe), "\n")
## Background genes: 12614
Analyzing genes by direction of change often reveals more specific biological themes than treating all DE genes together. For example, upregulated genes may reflect stress responses while downregulated genes reflect disrupted normal functions.
up_entrez <- gene_entrez %>%
filter(SYMBOL %in% rownames(up_genes)) %>%
pull(ENTREZID)
down_entrez <- gene_entrez %>%
filter(SYMBOL %in% rownames(down_genes)) %>%
pull(ENTREZID)
Gene Ontology (GO) Biological Process terms describe what genes do at the cellular and systems level (e.g., “DNA repair”, “cell cycle arrest”). Key output columns:
ego_up <- enrichGO(gene = up_entrez,
universe = universe,
OrgDb = org.Mm.eg.db,
ont = "BP",
pvalueCutoff = 0.05,
readable = TRUE)
ego_down <- enrichGO(gene = down_entrez,
universe = universe,
OrgDb = org.Mm.eg.db,
ont = "BP",
pvalueCutoff = 0.05,
readable = TRUE)
cat("GO terms — upregulated: ", nrow(ego_up), "\n")
## GO terms — upregulated: 184
cat("GO terms — downregulated:", nrow(ego_down), "\n")
## GO terms — downregulated: 182
The dot plot shows the top enriched terms. X-axis is the gene ratio, dot size is the number of genes, and dot color is the adjusted p-value.
p1 <- dotplot(ego_up, showCategory = 10, title = "Upregulated in KD") +
theme(axis.text.y = element_text(size = 8))
p2 <- dotplot(ego_down, showCategory = 10, title = "Downregulated in KD") +
theme(axis.text.y = element_text(size = 8))
p1 | p2
ggsave(here(figures_dir, "GO_up_vs_down.png"),
width = 12, height = 8, dpi = 300)
Shows which specific genes belong to which GO terms. Brown nodes are GO terms, colored nodes are genes, and hub genes appear in multiple terms.
Gene-concept networks show which genes belong to which terms. We can color the genes by their fold-change to see expression patterns.
# Extract fold changes for coloring
# Need a named vector: names = gene symbols, values = logFC
# For upregulated genes
up_gene_symbols <- rownames(up_genes)
up_foldchanges <- up_genes$logFC
names(up_foldchanges) <- up_gene_symbols
# For downregulated genes
down_gene_symbols <- rownames(down_genes)
down_foldchanges <- down_genes$logFC
names(down_foldchanges) <- down_gene_symbols
# Check the format
head(up_foldchanges)
## Klf6 Ccnd2 Gpr107 Trim25 Mx1 Sox9
## 1.275651 1.171705 1.088390 1.294747 2.777256 1.382973
head(down_foldchanges)
## Tspan32 C1d Hk2 Tcf7 Txnrd3 Icam4
## -1.597127 -1.546469 -1.012173 -1.349778 -1.481417 -1.271321
# Check ranges
cat("Upregulated logFC range:",
round(min(up_foldchanges), 2), "to",
round(max(up_foldchanges), 2), "\n")
## Upregulated logFC range: 1 to 6.42
cat("Downregulated logFC range:",
round(min(down_foldchanges), 2), "to",
round(max(down_foldchanges), 2), "\n")
## Downregulated logFC range: -4.18 to -1
p_up <- cnetplot(ego_up,
showCategory = 5,
foldChange = up_foldchanges,
node_label = "all") +
scale_color_gradient(low = "white",
high = "red",
name = "Log2FC\n(Upregulated)") +
labs(title = "Gene-Concept Network: Upregulated DEGS - Top 5 GO Terms") +
theme(legend.position = "right")
# Display the plot
print(p_up)
# Save the plot
ggsave(here(figures_dir, "GO_up_cnetplot.png"), plot = p_up,
width = 12, height = 10, dpi = 300)
p_down <- cnetplot(ego_down,
showCategory = 5,
foldChange = down_foldchanges,
node_label = "all") +
scale_color_gradient(low = "white",
high = "blue",
name = "Log2FC\n(Downregulated)") +
labs(title = "Gene-Concept Network: Downregulated DEGs - Top 5 GO Terms") +
theme(legend.position = "right")
# Display the plot
print(p_down)
# Save the plot
ggsave(here(figures_dir, "GO_down_cnetplot.png"), plot = p_down,
width = 12, height = 10, dpi = 300)
GO is hierarchical and many terms overlap. Simplification removes redundant terms while keeping the most significant representative from each cluster.
# Simplify GO results by removing redundant terms
ego_up_simple <- clusterProfiler::simplify(ego_up,
cutoff = 0.7, # Similarity threshold
by = "p.adjust", # Keep most significant
select_fun = min)
cat("Original GO terms:", nrow(ego_up), "\n")
## Original GO terms: 184
cat("After simplification:", nrow(ego_up_simple), "\n\n")
## After simplification: 81
# Compare
cat("Top 10 simplified terms:\n")
## Top 10 simplified terms:
head(ego_up_simple, 10)
## ID Description GeneRatio
## GO:0009617 GO:0009617 response to bacterium 90/1151
## GO:0051607 GO:0051607 defense response to virus 58/1151
## GO:0035458 GO:0035458 cellular response to interferon-beta 21/1151
## GO:0035456 GO:0035456 response to interferon-beta 22/1151
## GO:0045071 GO:0045071 negative regulation of viral genome replication 19/1151
## GO:0045088 GO:0045088 regulation of innate immune response 64/1151
## GO:0050777 GO:0050777 negative regulation of immune response 33/1151
## GO:0002831 GO:0002831 regulation of response to biotic stimulus 71/1151
## GO:0071222 GO:0071222 cellular response to lipopolysaccharide 35/1151
## GO:0001819 GO:0001819 positive regulation of cytokine production 57/1151
## BgRatio RichFactor FoldEnrichment zScore pvalue
## GO:0009617 398/11887 0.2261307 2.335374 8.872326 6.490657e-15
## GO:0051607 221/11887 0.2624434 2.710395 8.403611 7.510026e-13
## GO:0035458 47/11887 0.4468085 4.614433 8.129196 4.556341e-10
## GO:0035456 52/11887 0.4230769 4.369344 7.972549 6.368810e-10
## GO:0045071 42/11887 0.4523810 4.671983 7.805344 2.392140e-09
## GO:0045088 324/11887 0.1975309 2.040008 6.214530 1.938081e-08
## GO:0050777 124/11887 0.2661290 2.748459 6.408279 4.726571e-08
## GO:0002831 384/11887 0.1848958 1.909519 5.932070 5.483023e-08
## GO:0071222 142/11887 0.2464789 2.545521 6.066340 1.495704e-07
## GO:0001819 298/11887 0.1912752 1.975402 5.583457 3.609680e-07
## p.adjust qvalue
## GO:0009617 2.909112e-11 2.662536e-11
## GO:0051607 1.682997e-09 1.540346e-09
## GO:0035458 5.105380e-07 4.672647e-07
## GO:0035456 5.709002e-07 5.225106e-07
## GO:0045071 1.786929e-06 1.635468e-06
## GO:0045088 1.085810e-05 9.937767e-06
## GO:0050777 1.925863e-05 1.762627e-05
## GO:0002831 2.047909e-05 1.874328e-05
## GO:0071222 4.469164e-05 4.090357e-05
## GO:0001819 8.988103e-05 8.226271e-05
## geneID
## GO:0009617 Efnb2/Srr/Nr1h3/Aqp1/Shpk/Adamts13/H2-M3/Dhx58/Cd68/Pde4d/Ccdc80/Slc15a2/Mx2/Eif2ak2/C3/Tap2/Pdcd4/Casp1/Stat1/Tnnt2/Alpk1/Ifi44/Gbp3/Gbp2/Ephb2/Fosl2/Nos1/Oas1b/Irf5/Usp18/Trim30a/Star/Tent5a/Plscr2/Oas2/Resf1/Casp4/Ifit1/Nfkbiz/Ccl2/Thrsp/Isg15/Tmem255a/Fer1l6/Serpine1/Arid5a/Gch1/Akap12/Il18/Dusp10/Znfx1/Gbp7/Chd7/Il36g/Tmem229b/Ppm1e/Irgm1/Dmbt1/Gimap6/Nlrp10/Nlrc3/Gas5/Gsdmc4/Scimp/Trim12c/Trim30d/Fcgr4/Trim5/H2-K1/Lilrb4a/Trim12a/Oas1g/H2-T23/Irgm2/Sp110/Naip5/Slfn2/Ifi204/Nlrc5/Defb25/Ifit3/Cd80/Cyp3a41a/Wfdc3/Ighg2c/Igtp/Naip6/Naip2/Psmb9/Gbp6
## GO:0051607 Trim25/Il12rb1/Serinc3/Dhx58/Rsad2/Rnf135/Rab2b/Parp9/Mx2/Eif2ak2/Irf7/Casp1/Stat1/Ifih1/Zbp1/Adar/Gbp2/Oasl2/Oas1b/Irf5/Usp18/Cd37/Trim30a/Trim21/Oas2/Rtp4/Ifit1/Ccl5/Elmod2/Slfn8/Isg15/Atg14/Ddx60/Isg20/Znfx1/Ifi203/Stat2/Gbp7/Rigi/Apobec1/Oasl1/Irgm1/Dtx3l/Trim34a/Trim12c/Trim30d/Trim5/Ifit3b/Trim12a/Rnasel/Oas1g/Ifit1bl2/Irgm2/Ifi207/Nlrc5/Ifit3/Igtp/Mndal
## GO:0035458 Hcn1/Stat1/Ifi211/Gbp3/Gbp2/Oas1b/Ifit1/Ifi203/Gbp7/Irgm1/Oas1g/Irgm2/Ifi204/Ifi207/Iigp1c/Ifit3/Igtp/Ifi47/Tgtp2/Mndal/Gbp6
## GO:0035456 Hcn1/Stat1/Ifi211/Gbp3/Gbp2/Oas1b/Ifit1/Ifi203/Gbp7/Xaf1/Irgm1/Oas1g/Irgm2/Ifi204/Ifi207/Iigp1c/Ifit3/Igtp/Ifi47/Tgtp2/Mndal/Gbp6
## GO:0045071 Prox1/Rsad2/Mx2/Eif2ak2/Ifih1/Oasl2/Oas1b/Oas2/Ccl5/Isg15/Isg20/Znfx1/Ifi203/Oasl1/Zfp809/Rnasel/Oas1g/Ifi207/Mndal
## GO:0045088 Trim25/Nr1h3/Ifi35/H2-M3/Dhx58/Esr1/Rsad2/Rnf135/Serpinb9b/Slc15a2/Parp9/Eif2ak2/Tap2/Cd74/Irf7/Casp1/Ifi211/Ifih1/Zbp1/Adar/Alpk1/Gbp3/Gbp2/Oas1b/Usp18/Clec2d/Trim30a/Trim21/Casp4/Parp14/Nlrp4c/Nfkbiz/Isg15/Tap1/Ddx60/Dusp10/Znfx1/Ifi203/Stat2/Gbp7/Rigi/Oasl1/Serpinb1a/Nlrp4e/Irgm1/Sfn/Nlrp10/Nlrc3/Sarm1/Scimp/Trim12c/Trim30d/Trim5/Trim12a/Oas1g/H2-T23/Irgm2/Rbm47/Ifi204/Ifi207/Nlrc5/Igtp/Mndal/Hspa1b
## GO:0050777 H2-M3/Cd46/Dhx58/Enpp3/Serpinb9b/Nckap1l/Bcl6/Masp1/Tap2/Col3a1/Adar/Oas1b/Usp18/Clec2d/Trim21/Parp14/Nlrp4c/Isg15/Tap1/Dusp10/Ifi203/Stat2/Nlrp4e/Sfn/Nlrc3/Lgals3/Lilrb4a/Oas1g/H2-T23/Ifi207/Nlrc5/Cd80/Mndal
## GO:0002831 Trim25/Il12rb1/Nr1h3/Ifi35/H2-M3/Dhx58/Esr1/Rsad2/Rnf135/Serpinb9b/Slc15a2/Parp9/Eif2ak2/Tap2/Cd74/Irf7/Casp1/Stat1/Ifi211/Ifih1/Zbp1/Adar/Alpk1/Gbp3/Gbp2/Cxcl1/Oas1b/Usp18/Clec2d/Cd37/Trim30a/Trim21/Casp4/Parp14/Nlrp4c/Ccl5/Elmod2/Nfkbiz/Isg15/Tap1/Ddx60/Dusp10/Znfx1/Ifi203/Stat2/Gbp7/Rigi/Oasl1/Serpinb1a/Nlrp4e/Irgm1/Sfn/Dtx3l/Nlrp10/Nlrc3/Sarm1/Scimp/Trim12c/Trim30d/Trim5/Trim12a/Oas1g/H2-T23/Irgm2/Rbm47/Ifi204/Ifi207/Nlrc5/Igtp/Mndal/Hspa1b
## GO:0071222 Efnb2/Nr1h3/Shpk/Adamts13/Cd68/Pde4d/Pdcd4/Casp1/Stat1/Gbp3/Gbp2/Ephb2/Trim30a/Star/Plscr2/Nfkbiz/Ccl2/Serpine1/Arid5a/Gch1/Il18/Il36g/Ppm1e/Irgm1/Scimp/Trim12c/Trim30d/Fcgr4/Trim5/Lilrb4a/Trim12a/Irgm2/Cd80/Igtp/Gbp6
## GO:0001819 Il12rb1/Mapk13/Hspb1/Sulf2/Ager/H2-M3/Cd46/Cd34/Dhx58/Flt4/Rsad2/Rnf135/Pde4d/Rab2b/Zbtb20/Eif2ak2/C3/Cd74/Ddit3/Irf7/Casp1/Cd28/Stat1/Ifih1/Il6ra/Cd1d1/Ephb2/Oas1b/Irf5/Oas2/P2ry2/Casp4/Ccl5/Ccl2/Isg15/Serpine1/Arid5a/Egr1/Akap12/Il18/Ifi203/Rigi/Ankrd42/Il36g/Sorl1/Nlrp10/Pla2r1/Scimp/Sphk1/Oas1g/H2-T23/Rbm47/Naip5/Ifi207/Defb25/Mndal/Hspa1b
## Count
## GO:0009617 90
## GO:0051607 58
## GO:0035458 21
## GO:0035456 22
## GO:0045071 19
## GO:0045088 64
## GO:0050777 33
## GO:0002831 71
## GO:0071222 35
## GO:0001819 57
dotplot(ego_up_simple,
showCategory = 10,
title = "Simplified GO-BP Enrichment")
ggsave(here(figures_dir, "GO_BP_simplified_dotplot.png"),
width = 10, height = 6, dpi = 300)
KEGG provides curated pathway maps for metabolism, signaling, and
disease. It complements GO by providing a different, more focused view
of the same biology. Common organism codes: mmu = mouse,
hsa = human.
# KEGG uses organism codes: "mmu" = mouse, "hsa" = human
kegg_up <- enrichKEGG(gene = up_entrez,
universe = universe,
organism = "mmu", # Mouse
pAdjustMethod = "BH",
pvalueCutoff = 0.05,
qvalueCutoff = 0.05)
kegg_down <- enrichKEGG(gene = down_entrez,
universe = universe,
organism = "mmu", # Mouse
pAdjustMethod = "BH",
pvalueCutoff = 0.05,
qvalueCutoff = 0.05)
# Add gene symbols (KEGG uses Entrez IDs)
kegg_up <- setReadable(kegg_up,
OrgDb = org.Mm.eg.db,
keyType = "ENTREZID")
kegg_down <- setReadable(kegg_down,
OrgDb = org.Mm.eg.db,
keyType = "ENTREZID")
# Clean pathway names - remove " - Mus musculus" suffix
kegg_up@result$Description <- gsub(" - .*$", "", kegg_up@result$Description)
kegg_down@result$Description <- gsub(" - .*$", "", kegg_down@result$Description)
# Summary
cat("Enriched KEGG pathways - Upregulated DEGs: ", nrow(kegg_up), "\n")
## Enriched KEGG pathways - Upregulated DEGs: 39
cat("Enriched KEGG pathways - Downregulated DEGs: ", nrow(kegg_down), "\n\n")
## Enriched KEGG pathways - Downregulated DEGs: 10
# Show results
head(kegg_up, 10)
## category
## mmu05330 Human Diseases
## mmu05332 Human Diseases
## mmu04940 Human Diseases
## mmu05320 Human Diseases
## mmu04514 Environmental Information Processing
## mmu05322 Human Diseases
## mmu04974 Organismal Systems
## mmu05168 Human Diseases
## mmu04612 Organismal Systems
## mmu04640 Organismal Systems
## subcategory ID
## mmu05330 Immune disease mmu05330
## mmu05332 Immune disease mmu05332
## mmu04940 Endocrine and metabolic disease mmu04940
## mmu05320 Immune disease mmu05320
## mmu04514 Signaling molecules and interaction mmu04514
## mmu05322 Immune disease mmu05322
## mmu04974 Digestive system mmu04974
## mmu05168 Infectious disease: viral mmu05168
## mmu04612 Immune system mmu04612
## mmu04640 Immune system mmu04640
## Description GeneRatio BgRatio RichFactor
## mmu05330 Allograft rejection 11/477 13/5350 0.8461538
## mmu05332 Graft-versus-host disease 11/477 13/5350 0.8461538
## mmu04940 Type I diabetes mellitus 12/477 16/5350 0.7500000
## mmu05320 Autoimmune thyroid disease 11/477 14/5350 0.7857143
## mmu04514 Cell adhesion molecule (CAM) interaction 21/477 65/5350 0.3230769
## mmu05322 Systemic lupus erythematosus 16/477 43/5350 0.3720930
## mmu04974 Protein digestion and absorption 16/477 46/5350 0.3478261
## mmu05168 Herpes simplex virus 1 infection 30/477 134/5350 0.2238806
## mmu04612 Antigen processing and presentation 15/477 42/5350 0.3571429
## mmu04640 Hematopoietic cell lineage 12/477 31/5350 0.3870968
## FoldEnrichment zScore pvalue p.adjust qvalue
## mmu05330 9.490405 9.588458 1.680774e-10 2.655623e-08 2.273468e-08
## mmu05332 9.490405 9.588458 1.680774e-10 2.655623e-08 2.273468e-08
## mmu04940 8.411950 9.288877 2.892574e-10 3.046845e-08 2.608392e-08
## mmu05320 8.812519 9.156815 7.216607e-10 5.701119e-08 4.880705e-08
## mmu04514 3.623609 6.657798 8.563850e-08 5.412353e-06 4.633494e-06
## mmu05322 4.173370 6.536236 3.409901e-07 1.795881e-05 1.537447e-05
## mmu04974 3.901194 6.182316 9.897262e-07 4.467907e-05 3.824957e-05
## mmu05168 2.511030 5.541839 1.427357e-06 5.175259e-05 4.430518e-05
## mmu04612 4.005690 6.117876 1.473966e-06 5.175259e-05 4.430518e-05
## mmu04640 4.341651 5.837465 6.413940e-06 1.989704e-04 1.703378e-04
## geneID
## mmu05330 H2-M3/Fas/Cd28/H2-Q4/H2-DMa/H2-K1/H2-T23/H2-D1/Cd80/H2-T10/H2-DMb1
## mmu05332 H2-M3/Fas/Cd28/H2-Q4/H2-DMa/H2-K1/H2-T23/H2-D1/Cd80/H2-T10/H2-DMb1
## mmu04940 H2-M3/Fas/Cd28/H2-Q4/H2-DMa/Cpe/H2-K1/H2-T23/H2-D1/Cd80/H2-T10/H2-DMb1
## mmu05320 H2-M3/Fas/Cd28/H2-Q4/H2-DMa/H2-K1/H2-T23/H2-D1/Cd80/H2-T10/H2-DMb1
## mmu04514 Cldn15/Esam/H2-M3/Cd34/Vcan/Itgb8/Cd28/Nfasc/Ptprd/Ptprm/H2-Q4/H2-DMa/Cntnap2/Jam2/Cntn1/H2-K1/H2-T23/H2-D1/Cd80/H2-T10/H2-DMb1
## mmu05322 H2bc4/Macroh2a2/C3/Cd28/Trim21/H2-DMa/H2bc8/Fcgr4/H4c8/H2ac19/C4b/Cd80/H2ac25/H2-DMb1/H3c4/H3c7
## mmu04974 Col18a1/Col20a1/Kcnk5/Slc3a1/Col11a2/Col3a1/Col11a1/Col15a1/Col9a2/Slc7a9/Col10a1/Atp1b2/Col27a1/Slc8a3/Kcnj13/Col4a3
## mmu05168 Irf9/H2-M3/Eif2ak2/C3/Tapbp/Tap2/Cd74/Fas/Irf7/Stat1/Sp100/Ifih1/Calr4/Pik3r3/Oas1b/Oas2/Ccl5/Ccl2/H2-Q4/Tap1/H2-DMa/Stat2/Rigi/H2-K1/Rnasel/Oas1g/H2-T23/H2-D1/H2-T10/H2-DMb1
## mmu04612 Rfx5/H2-M3/Tapbp/Tap2/Cd74/Calr4/H2-Q4/Tap1/H2-DMa/H2-K1/H2-T23/H2-D1/H2-T10/H2-DMb1/Hspa1b
## mmu04640 Itga2/Cd34/Kitl/Dntt/Il6ra/Cd1d1/Cd38/Cd37/Itga2b/H2-DMa/Csf2ra/H2-DMb1
## Count
## mmu05330 11
## mmu05332 11
## mmu04940 12
## mmu05320 11
## mmu04514 21
## mmu05322 16
## mmu04974 16
## mmu05168 30
## mmu04612 15
## mmu04640 12
# Dot plot
dotplot(kegg_up,
showCategory = 15,
title = "KEGG Pathway Enrichment")
ggsave(here("demo-analysis", "output", "differential-expression", "figures", "KEGG_dotplot.png"), width = 10, height = 6, dpi = 300)
# Prepare GO data (up and down)
ego_up_df <- as.data.frame(ego_up) %>%
mutate(Database = "GO-BP",
Direction = "Up",
Term = Description) %>%
dplyr::select(Database, Direction, Term, Count, FDR = p.adjust) %>%
head(10)
ego_down_df <- as.data.frame(ego_down) %>%
mutate(Database = "GO-BP",
Direction = "Down",
Term = Description) %>%
dplyr::select(Database, Direction, Term, Count, FDR = p.adjust) %>%
head(10)
# Prepare KEGG data (up and down)
kegg_up_df <- as.data.frame(kegg_up) %>%
mutate(Database = "KEGG",
Direction = "Up",
Term = Description) %>%
dplyr::select(Database, Direction, Term, Count, FDR = p.adjust) %>%
head(10)
kegg_down_df <- as.data.frame(kegg_down) %>%
mutate(Database = "KEGG",
Direction = "Down",
Term = Description) %>%
dplyr::select(Database, Direction, Term, Count, FDR = p.adjust) %>%
head(10)
# Combine all
comparison_df <- bind_rows(ego_up_df, ego_down_df,
kegg_up_df, kegg_down_df)
# Plot: facet by database, color by direction
ggplot(comparison_df, aes(x = Count, y = reorder(Term, Count), fill = Direction)) +
geom_col() +
scale_fill_manual(values = c("Up" = "red", "Down" = "blue")) +
labs(title = "Top Enriched Terms: GO vs KEGG (Up and Down)",
x = "Number of Genes",
y = "Term",
fill = "Direction") +
theme_minimal() +
theme(axis.text.y = element_text(size = 8)) +
facet_wrap(~Database, scales = "free_y", ncol = 1)
ggsave(here(figures_dir, "GO_vs_KEGG_comparison_direction.png"),
width = 12, height = 10, dpi = 300)
# ===== Export GO-BP Results =====
write.csv(as.data.frame(ego_up),
here(results_dir, "GO_BP_upregulated.csv"),
row.names = FALSE)
write.csv(as.data.frame(ego_down),
here(results_dir, "GO_BP_downregulated.csv"),
row.names = FALSE)
# ===== Export KEGG Results =====
write.csv(as.data.frame(kegg_up),
here(results_dir, "KEGG_upregulated.csv"),
row.names = FALSE)
write.csv(as.data.frame(kegg_down),
here(results_dir, "KEGG_downregulated.csv"),
row.names = FALSE)
# ===== Create Summary Table =====
pathway_summary <- data.frame(
Database = rep(c("GO-BP", "KEGG"), each = 2),
Direction = rep(c("Up", "Down"), 2),
N_Terms = c(nrow(ego_up), nrow(ego_down),
nrow(kegg_up), nrow(kegg_down)),
Top_Term = c(
ego_up$Description[1],
ego_down$Description[1],
kegg_up$Description[1],
kegg_down$Description[1]
),
Top_FDR = formatC(c(
ego_up$p.adjust[1],
ego_down$p.adjust[1],
kegg_up$p.adjust[1],
kegg_down$p.adjust[1]
), format = "e", digits = 2)
)
write.csv(pathway_summary,
here(results_dir, "pathway_summary.csv"),
row.names = FALSE)
# ===== Print Summary =====
cat("\n=== Pathway Enrichment Results Exported ===\n\n")
##
## === Pathway Enrichment Results Exported ===
print(pathway_summary)
## Database Direction N_Terms Top_Term Top_FDR
## 1 GO-BP Up 184 response to bacterium 2.91e-11
## 2 GO-BP Down 182 nuclear division 1.61e-24
## 3 KEGG Up 39 Allograft rejection 2.66e-08
## 4 KEGG Down 10 Cell cycle 2.55e-09
cat("\nFiles saved:\n")
##
## Files saved:
cat(" ✓ GO_BP_upregulated.csv (", nrow(ego_up), " terms)\n", sep = "")
## ✓ GO_BP_upregulated.csv (184 terms)
cat(" ✓ GO_BP_downregulated.csv (", nrow(ego_down), " terms)\n", sep = "")
## ✓ GO_BP_downregulated.csv (182 terms)
cat(" ✓ KEGG_upregulated.csv (", nrow(kegg_up), " terms)\n", sep = "")
## ✓ KEGG_upregulated.csv (39 terms)
cat(" ✓ KEGG_downregulated.csv (", nrow(kegg_down), " terms)\n", sep = "")
## ✓ KEGG_downregulated.csv (10 terms)
cat(" ✓ pathway_summary.csv\n")
## ✓ pathway_summary.csv
cat("\nLocation:", results_dir, "\n")
##
## Location: /blue/bioinf_workshop/hkates/rnaseq_workshop/demo-analysis/output/differential-expression/results
sessionInfo()
## R version 4.5.3 (2026-03-11)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] here_1.0.2 patchwork_1.3.2 lubridate_1.9.5
## [4] forcats_1.0.1 stringr_1.6.0 dplyr_1.2.0
## [7] purrr_1.2.1 readr_2.2.0 tidyr_1.3.2
## [10] tibble_3.3.1 ggplot2_4.0.2 tidyverse_2.0.0
## [13] enrichplot_1.30.5 org.Mm.eg.db_3.21.0 AnnotationDbi_1.72.0
## [16] IRanges_2.44.0 S4Vectors_0.48.0 Biobase_2.70.0
## [19] BiocGenerics_0.56.0 generics_0.1.4 clusterProfiler_4.18.4
##
## loaded via a namespace (and not attached):
## [1] DBI_1.3.0 gson_0.1.0 rlang_1.1.7
## [4] magrittr_2.0.4 DOSE_4.4.0 otel_0.2.0
## [7] compiler_4.5.3 RSQLite_2.4.6 png_0.1-9
## [10] systemfonts_1.3.2 vctrs_0.7.2 reshape2_1.4.5
## [13] pkgconfig_2.0.3 crayon_1.5.3 fastmap_1.2.0
## [16] XVector_0.50.0 labeling_0.4.3 rmarkdown_2.30
## [19] tzdb_0.5.0 ragg_1.5.2 bit_4.6.0
## [22] xfun_0.57 cachem_1.1.0 aplot_0.2.9
## [25] jsonlite_2.0.0 blob_1.3.0 tidydr_0.0.6
## [28] tweenr_2.0.3 BiocParallel_1.44.0 cluster_2.1.8.2
## [31] parallel_4.5.3 R6_2.6.1 bslib_0.10.0
## [34] stringi_1.8.7 RColorBrewer_1.1-3 jquerylib_0.1.4
## [37] GOSemSim_2.36.0 Rcpp_1.1.1 Seqinfo_0.99.2
## [40] knitr_1.51 ggtangle_0.1.1 R.utils_2.13.0
## [43] timechange_0.4.0 Matrix_1.7-5 splines_4.5.3
## [46] igraph_2.2.2 tidyselect_1.2.1 qvalue_2.42.0
## [49] rstudioapi_0.18.0 dichromat_2.0-0.1 yaml_2.3.12
## [52] codetools_0.2-20 lattice_0.22-9 plyr_1.8.9
## [55] withr_3.0.2 treeio_1.34.0 KEGGREST_1.50.0
## [58] S7_0.2.1 evaluate_1.0.5 gridGraphics_0.5-1
## [61] polyclip_1.10-7 scatterpie_0.2.6 Biostrings_2.78.0
## [64] pillar_1.11.1 ggtree_4.0.5 ggfun_0.2.0
## [67] rprojroot_2.1.1 hms_1.1.4 scales_1.4.0
## [70] tidytree_0.4.7 glue_1.8.0 gdtools_0.5.0
## [73] lazyeval_0.2.2 tools_4.5.3 ggnewscale_0.5.2
## [76] data.table_1.18.2.1 fgsea_1.36.2 ggiraph_0.9.6
## [79] fs_2.0.1 fastmatch_1.1-8 cowplot_1.2.0
## [82] grid_4.5.3 ape_5.8-1 nlme_3.1-168
## [85] ggforce_0.5.0 cli_3.6.5 rappdirs_0.3.4
## [88] textshaping_1.0.5 fontBitstreamVera_0.1.1 gtable_0.3.6
## [91] R.methodsS3_1.8.2 yulab.utils_0.2.4 sass_0.4.10
## [94] digest_0.6.39 fontquiver_0.2.1 ggrepel_0.9.8
## [97] ggplotify_0.1.3 htmlwidgets_1.6.4 farver_2.1.2
## [100] memoise_2.0.1 htmltools_0.5.9 R.oo_1.27.1
## [103] lifecycle_1.0.5 httr_1.4.8 GO.db_3.22.0
## [106] fontLiberation_0.1.0 bit64_4.6.0-1 MASS_7.3-65