Part 3: Pathway Analysis

Overview

Having identified differentially expressed genes, we now ask what they mean biologically. Pathway analysis uses over-representation testing to ask: are our DE genes enriched in particular biological processes or pathways compared to what we would expect by chance?

Requires: de_results_all.csv produced by 03_differential_expression.Rmd

Setup

library(clusterProfiler)
library(org.Mm.eg.db)   # Mouse annotation — use org.Hs.eg.db for human
library(enrichplot)
library(tidyverse)
library(patchwork)
library(here)

set.seed(42)

results_dir <- here("demo-analysis", "output", "differential-expression", "results")
figures_dir <- here("demo-analysis", "output", "differential-expression", "figures")

dir.create(results_dir, showWarnings = FALSE, recursive = TRUE)
dir.create(figures_dir, showWarnings = FALSE, recursive = TRUE)

Load DE Results

all_results <- read.csv(here(results_dir, "de_results_all.csv"))
rownames(all_results) <- all_results$gene

sig_genes  <- all_results %>% filter(adj.P.Val < 0.05 & abs(logFC) > 1)
up_genes   <- sig_genes  %>% filter(logFC > 1)
down_genes <- sig_genes  %>% filter(logFC < 1)

cat("Total DE genes (FDR < 0.05) & FC > 2x:", nrow(sig_genes), "\n")

## Total DE genes (FDR < 0.05) & FC > 2x: 2214

cat("Upregulated in KD:          ", nrow(up_genes), "\n")

## Upregulated in KD:           1435

cat("Downregulated in KD:        ", nrow(down_genes), "\n")

## Downregulated in KD:         779

Convert Gene IDs

Most pathway databases use Entrez Gene IDs rather than gene symbols. We convert here and also define a background gene set — all genes that were actually tested. Using the correct background is important for accurate enrichment statistics: comparing to all genes in the genome rather than genes in your experiment inflates false positives.

A conversion rate of 85–95% is normal. Genes that fail to convert are typically novel genes, pseudogenes, or non-coding RNAs not yet in the annotation database.

# Convert DE gene symbols to Entrez IDs
gene_entrez <- bitr(rownames(sig_genes),
                    fromType = "SYMBOL",
                    toType   = "ENTREZID",
                    OrgDb    = org.Mm.eg.db)

# Background: all genes tested
background_entrez <- bitr(rownames(all_results),
                           fromType = "SYMBOL",
                           toType   = "ENTREZID",
                           OrgDb    = org.Mm.eg.db)

universe <- background_entrez$ENTREZID

cat("DE genes submitted:     ", nrow(sig_genes), "\n")

## DE genes submitted:      2214

cat("Successfully converted: ", nrow(gene_entrez), "\n")

## Successfully converted:  1928

cat("Background genes:       ", length(universe), "\n")

## Background genes:        12614

Upregulated vs Downregulated Genes

Analyzing genes by direction of change often reveals more specific biological themes than treating all DE genes together. For example, upregulated genes may reflect stress responses while downregulated genes reflect disrupted normal functions.

up_entrez <- gene_entrez %>%
  filter(SYMBOL %in% rownames(up_genes)) %>%
  pull(ENTREZID)

down_entrez <- gene_entrez %>%
  filter(SYMBOL %in% rownames(down_genes)) %>%
  pull(ENTREZID)

GO Enrichment — Biological Process

Gene Ontology (GO) Biological Process terms describe what genes do at the cellular and systems level (e.g., “DNA repair”, “cell cycle arrest”). Key output columns:

GeneRatio: Fraction of your DE genes annotated to this term
BgRatio: Fraction of background genes annotated to this term
p.adjust: FDR-corrected p-value
geneID: Gene symbols driving the enrichment

ego_up <- enrichGO(gene         = up_entrez,
                   universe     = universe,
                   OrgDb        = org.Mm.eg.db,
                   ont          = "BP",
                   pvalueCutoff = 0.05,
                   readable     = TRUE)

ego_down <- enrichGO(gene         = down_entrez,
                     universe     = universe,
                     OrgDb        = org.Mm.eg.db,
                     ont          = "BP",
                     pvalueCutoff = 0.05,
                     readable     = TRUE)

cat("GO terms — upregulated:  ", nrow(ego_up), "\n")

## GO terms — upregulated:   184

cat("GO terms — downregulated:", nrow(ego_down), "\n")

## GO terms — downregulated: 182

Dot Plot

The dot plot shows the top enriched terms. X-axis is the gene ratio, dot size is the number of genes, and dot color is the adjusted p-value.

p1 <- dotplot(ego_up,   showCategory = 10, title = "Upregulated in KD") +
      theme(axis.text.y = element_text(size = 8))

p2 <- dotplot(ego_down, showCategory = 10, title = "Downregulated in KD") +
      theme(axis.text.y = element_text(size = 8))

p1 | p2

ggsave(here(figures_dir, "GO_up_vs_down.png"),
       width = 12, height = 8, dpi = 300)

Gene-Concept Network

Shows which specific genes belong to which GO terms. Brown nodes are GO terms, colored nodes are genes, and hub genes appear in multiple terms.

Gene-concept networks show which genes belong to which terms. We can color the genes by their fold-change to see expression patterns.

# Extract fold changes for coloring
# Need a named vector: names = gene symbols, values = logFC

# For upregulated genes
up_gene_symbols <- rownames(up_genes)
up_foldchanges <- up_genes$logFC
names(up_foldchanges) <- up_gene_symbols

# For downregulated genes
down_gene_symbols <- rownames(down_genes)
down_foldchanges <- down_genes$logFC
names(down_foldchanges) <- down_gene_symbols

# Check the format
head(up_foldchanges)

##     Klf6    Ccnd2   Gpr107   Trim25      Mx1     Sox9 
## 1.275651 1.171705 1.088390 1.294747 2.777256 1.382973

head(down_foldchanges)

##   Tspan32       C1d       Hk2      Tcf7    Txnrd3     Icam4 
## -1.597127 -1.546469 -1.012173 -1.349778 -1.481417 -1.271321

# Check ranges
cat("Upregulated logFC range:", 
    round(min(up_foldchanges), 2), "to", 
    round(max(up_foldchanges), 2), "\n")

## Upregulated logFC range: 1 to 6.42

cat("Downregulated logFC range:", 
    round(min(down_foldchanges), 2), "to", 
    round(max(down_foldchanges), 2), "\n")

## Downregulated logFC range: -4.18 to -1

p_up <- cnetplot(ego_up, 
         showCategory = 5, 
         foldChange = up_foldchanges, 
         node_label = "all") + 
  scale_color_gradient(low = "white",
                       high = "red",
                       name = "Log2FC\n(Upregulated)") +
  labs(title = "Gene-Concept Network: Upregulated DEGS - Top 5 GO Terms") + 
  theme(legend.position = "right")

# Display the plot
print(p_up)

# Save the plot
ggsave(here(figures_dir, "GO_up_cnetplot.png"), plot = p_up, 
       width = 12, height = 10, dpi = 300)

p_down <- cnetplot(ego_down, 
         showCategory = 5, 
         foldChange = down_foldchanges, 
         node_label = "all") + 
  scale_color_gradient(low = "white",
                       high = "blue",
                       name = "Log2FC\n(Downregulated)") +
  labs(title = "Gene-Concept Network: Downregulated DEGs - Top 5 GO Terms") + 
  theme(legend.position = "right")

# Display the plot
print(p_down)

# Save the plot
ggsave(here(figures_dir, "GO_down_cnetplot.png"), plot = p_down,
       width = 12, height = 10, dpi = 300)

Simplify Redundant Terms

GO is hierarchical and many terms overlap. Simplification removes redundant terms while keeping the most significant representative from each cluster.

# Simplify GO results by removing redundant terms
ego_up_simple <- clusterProfiler::simplify(ego_up,
                          cutoff = 0.7,      # Similarity threshold
                          by = "p.adjust",   # Keep most significant
                          select_fun = min)

cat("Original GO terms:", nrow(ego_up), "\n")

## Original GO terms: 184

cat("After simplification:", nrow(ego_up_simple), "\n\n")

## After simplification: 81

# Compare
cat("Top 10 simplified terms:\n")

## Top 10 simplified terms:

head(ego_up_simple, 10)

##                    ID                                     Description GeneRatio
## GO:0009617 GO:0009617                           response to bacterium   90/1151
## GO:0051607 GO:0051607                       defense response to virus   58/1151
## GO:0035458 GO:0035458            cellular response to interferon-beta   21/1151
## GO:0035456 GO:0035456                     response to interferon-beta   22/1151
## GO:0045071 GO:0045071 negative regulation of viral genome replication   19/1151
## GO:0045088 GO:0045088            regulation of innate immune response   64/1151
## GO:0050777 GO:0050777          negative regulation of immune response   33/1151
## GO:0002831 GO:0002831       regulation of response to biotic stimulus   71/1151
## GO:0071222 GO:0071222         cellular response to lipopolysaccharide   35/1151
## GO:0001819 GO:0001819      positive regulation of cytokine production   57/1151
##              BgRatio RichFactor FoldEnrichment   zScore       pvalue
## GO:0009617 398/11887  0.2261307       2.335374 8.872326 6.490657e-15
## GO:0051607 221/11887  0.2624434       2.710395 8.403611 7.510026e-13
## GO:0035458  47/11887  0.4468085       4.614433 8.129196 4.556341e-10
## GO:0035456  52/11887  0.4230769       4.369344 7.972549 6.368810e-10
## GO:0045071  42/11887  0.4523810       4.671983 7.805344 2.392140e-09
## GO:0045088 324/11887  0.1975309       2.040008 6.214530 1.938081e-08
## GO:0050777 124/11887  0.2661290       2.748459 6.408279 4.726571e-08
## GO:0002831 384/11887  0.1848958       1.909519 5.932070 5.483023e-08
## GO:0071222 142/11887  0.2464789       2.545521 6.066340 1.495704e-07
## GO:0001819 298/11887  0.1912752       1.975402 5.583457 3.609680e-07
##                p.adjust       qvalue
## GO:0009617 2.909112e-11 2.662536e-11
## GO:0051607 1.682997e-09 1.540346e-09
## GO:0035458 5.105380e-07 4.672647e-07
## GO:0035456 5.709002e-07 5.225106e-07
## GO:0045071 1.786929e-06 1.635468e-06
## GO:0045088 1.085810e-05 9.937767e-06
## GO:0050777 1.925863e-05 1.762627e-05
## GO:0002831 2.047909e-05 1.874328e-05
## GO:0071222 4.469164e-05 4.090357e-05
## GO:0001819 8.988103e-05 8.226271e-05
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   geneID
## GO:0009617 Efnb2/Srr/Nr1h3/Aqp1/Shpk/Adamts13/H2-M3/Dhx58/Cd68/Pde4d/Ccdc80/Slc15a2/Mx2/Eif2ak2/C3/Tap2/Pdcd4/Casp1/Stat1/Tnnt2/Alpk1/Ifi44/Gbp3/Gbp2/Ephb2/Fosl2/Nos1/Oas1b/Irf5/Usp18/Trim30a/Star/Tent5a/Plscr2/Oas2/Resf1/Casp4/Ifit1/Nfkbiz/Ccl2/Thrsp/Isg15/Tmem255a/Fer1l6/Serpine1/Arid5a/Gch1/Akap12/Il18/Dusp10/Znfx1/Gbp7/Chd7/Il36g/Tmem229b/Ppm1e/Irgm1/Dmbt1/Gimap6/Nlrp10/Nlrc3/Gas5/Gsdmc4/Scimp/Trim12c/Trim30d/Fcgr4/Trim5/H2-K1/Lilrb4a/Trim12a/Oas1g/H2-T23/Irgm2/Sp110/Naip5/Slfn2/Ifi204/Nlrc5/Defb25/Ifit3/Cd80/Cyp3a41a/Wfdc3/Ighg2c/Igtp/Naip6/Naip2/Psmb9/Gbp6
## GO:0051607                                                                                                                                                                                                    Trim25/Il12rb1/Serinc3/Dhx58/Rsad2/Rnf135/Rab2b/Parp9/Mx2/Eif2ak2/Irf7/Casp1/Stat1/Ifih1/Zbp1/Adar/Gbp2/Oasl2/Oas1b/Irf5/Usp18/Cd37/Trim30a/Trim21/Oas2/Rtp4/Ifit1/Ccl5/Elmod2/Slfn8/Isg15/Atg14/Ddx60/Isg20/Znfx1/Ifi203/Stat2/Gbp7/Rigi/Apobec1/Oasl1/Irgm1/Dtx3l/Trim34a/Trim12c/Trim30d/Trim5/Ifit3b/Trim12a/Rnasel/Oas1g/Ifit1bl2/Irgm2/Ifi207/Nlrc5/Ifit3/Igtp/Mndal
## GO:0035458                                                                                                                                                                                                                                                                                                                                                                                                                                                  Hcn1/Stat1/Ifi211/Gbp3/Gbp2/Oas1b/Ifit1/Ifi203/Gbp7/Irgm1/Oas1g/Irgm2/Ifi204/Ifi207/Iigp1c/Ifit3/Igtp/Ifi47/Tgtp2/Mndal/Gbp6
## GO:0035456                                                                                                                                                                                                                                                                                                                                                                                                                                             Hcn1/Stat1/Ifi211/Gbp3/Gbp2/Oas1b/Ifit1/Ifi203/Gbp7/Xaf1/Irgm1/Oas1g/Irgm2/Ifi204/Ifi207/Iigp1c/Ifit3/Igtp/Ifi47/Tgtp2/Mndal/Gbp6
## GO:0045071                                                                                                                                                                                                                                                                                                                                                                                                                                                           Prox1/Rsad2/Mx2/Eif2ak2/Ifih1/Oasl2/Oas1b/Oas2/Ccl5/Isg15/Isg20/Znfx1/Ifi203/Oasl1/Zfp809/Rnasel/Oas1g/Ifi207/Mndal
## GO:0045088                                                                                                                                                         Trim25/Nr1h3/Ifi35/H2-M3/Dhx58/Esr1/Rsad2/Rnf135/Serpinb9b/Slc15a2/Parp9/Eif2ak2/Tap2/Cd74/Irf7/Casp1/Ifi211/Ifih1/Zbp1/Adar/Alpk1/Gbp3/Gbp2/Oas1b/Usp18/Clec2d/Trim30a/Trim21/Casp4/Parp14/Nlrp4c/Nfkbiz/Isg15/Tap1/Ddx60/Dusp10/Znfx1/Ifi203/Stat2/Gbp7/Rigi/Oasl1/Serpinb1a/Nlrp4e/Irgm1/Sfn/Nlrp10/Nlrc3/Sarm1/Scimp/Trim12c/Trim30d/Trim5/Trim12a/Oas1g/H2-T23/Irgm2/Rbm47/Ifi204/Ifi207/Nlrc5/Igtp/Mndal/Hspa1b
## GO:0050777                                                                                                                                                                                                                                                                                                                                                              H2-M3/Cd46/Dhx58/Enpp3/Serpinb9b/Nckap1l/Bcl6/Masp1/Tap2/Col3a1/Adar/Oas1b/Usp18/Clec2d/Trim21/Parp14/Nlrp4c/Isg15/Tap1/Dusp10/Ifi203/Stat2/Nlrp4e/Sfn/Nlrc3/Lgals3/Lilrb4a/Oas1g/H2-T23/Ifi207/Nlrc5/Cd80/Mndal
## GO:0002831                                                                                                              Trim25/Il12rb1/Nr1h3/Ifi35/H2-M3/Dhx58/Esr1/Rsad2/Rnf135/Serpinb9b/Slc15a2/Parp9/Eif2ak2/Tap2/Cd74/Irf7/Casp1/Stat1/Ifi211/Ifih1/Zbp1/Adar/Alpk1/Gbp3/Gbp2/Cxcl1/Oas1b/Usp18/Clec2d/Cd37/Trim30a/Trim21/Casp4/Parp14/Nlrp4c/Ccl5/Elmod2/Nfkbiz/Isg15/Tap1/Ddx60/Dusp10/Znfx1/Ifi203/Stat2/Gbp7/Rigi/Oasl1/Serpinb1a/Nlrp4e/Irgm1/Sfn/Dtx3l/Nlrp10/Nlrc3/Sarm1/Scimp/Trim12c/Trim30d/Trim5/Trim12a/Oas1g/H2-T23/Irgm2/Rbm47/Ifi204/Ifi207/Nlrc5/Igtp/Mndal/Hspa1b
## GO:0071222                                                                                                                                                                                                                                                                                                                                                     Efnb2/Nr1h3/Shpk/Adamts13/Cd68/Pde4d/Pdcd4/Casp1/Stat1/Gbp3/Gbp2/Ephb2/Trim30a/Star/Plscr2/Nfkbiz/Ccl2/Serpine1/Arid5a/Gch1/Il18/Il36g/Ppm1e/Irgm1/Scimp/Trim12c/Trim30d/Fcgr4/Trim5/Lilrb4a/Trim12a/Irgm2/Cd80/Igtp/Gbp6
## GO:0001819                                                                                                                                                                                                                     Il12rb1/Mapk13/Hspb1/Sulf2/Ager/H2-M3/Cd46/Cd34/Dhx58/Flt4/Rsad2/Rnf135/Pde4d/Rab2b/Zbtb20/Eif2ak2/C3/Cd74/Ddit3/Irf7/Casp1/Cd28/Stat1/Ifih1/Il6ra/Cd1d1/Ephb2/Oas1b/Irf5/Oas2/P2ry2/Casp4/Ccl5/Ccl2/Isg15/Serpine1/Arid5a/Egr1/Akap12/Il18/Ifi203/Rigi/Ankrd42/Il36g/Sorl1/Nlrp10/Pla2r1/Scimp/Sphk1/Oas1g/H2-T23/Rbm47/Naip5/Ifi207/Defb25/Mndal/Hspa1b
##            Count
## GO:0009617    90
## GO:0051607    58
## GO:0035458    21
## GO:0035456    22
## GO:0045071    19
## GO:0045088    64
## GO:0050777    33
## GO:0002831    71
## GO:0071222    35
## GO:0001819    57

dotplot(ego_up_simple,
        showCategory = 10,
        title        = "Simplified GO-BP Enrichment")

ggsave(here(figures_dir, "GO_BP_simplified_dotplot.png"),
       width = 10, height = 6, dpi = 300)

KEGG Pathway Enrichment

KEGG provides curated pathway maps for metabolism, signaling, and disease. It complements GO by providing a different, more focused view of the same biology. Common organism codes: mmu = mouse, hsa = human.

# KEGG uses organism codes: "mmu" = mouse, "hsa" = human
kegg_up <- enrichKEGG(gene = up_entrez,
                      universe = universe, 
                      organism = "mmu",      # Mouse
                      pAdjustMethod = "BH",
                      pvalueCutoff = 0.05,
                      qvalueCutoff = 0.05)

kegg_down <- enrichKEGG(gene = down_entrez,
                        universe = universe, 
                        organism = "mmu",      # Mouse
                        pAdjustMethod = "BH",
                        pvalueCutoff = 0.05,
                        qvalueCutoff = 0.05)

# Add gene symbols (KEGG uses Entrez IDs)
kegg_up <- setReadable(kegg_up, 
                       OrgDb = org.Mm.eg.db, 
                       keyType = "ENTREZID")

kegg_down <- setReadable(kegg_down, 
                         OrgDb = org.Mm.eg.db, 
                         keyType = "ENTREZID")

# Clean pathway names - remove " - Mus musculus" suffix
kegg_up@result$Description <- gsub(" - .*$", "", kegg_up@result$Description)
kegg_down@result$Description <- gsub(" - .*$", "", kegg_down@result$Description)

# Summary
cat("Enriched KEGG pathways - Upregulated DEGs: ", nrow(kegg_up), "\n")

## Enriched KEGG pathways - Upregulated DEGs:  39

cat("Enriched KEGG pathways - Downregulated DEGs: ", nrow(kegg_down), "\n\n")

## Enriched KEGG pathways - Downregulated DEGs:  10

# Show results
head(kegg_up, 10)

##                                      category
## mmu05330                       Human Diseases
## mmu05332                       Human Diseases
## mmu04940                       Human Diseases
## mmu05320                       Human Diseases
## mmu04514 Environmental Information Processing
## mmu05322                       Human Diseases
## mmu04974                   Organismal Systems
## mmu05168                       Human Diseases
## mmu04612                   Organismal Systems
## mmu04640                   Organismal Systems
##                                  subcategory       ID
## mmu05330                      Immune disease mmu05330
## mmu05332                      Immune disease mmu05332
## mmu04940     Endocrine and metabolic disease mmu04940
## mmu05320                      Immune disease mmu05320
## mmu04514 Signaling molecules and interaction mmu04514
## mmu05322                      Immune disease mmu05322
## mmu04974                    Digestive system mmu04974
## mmu05168           Infectious disease: viral mmu05168
## mmu04612                       Immune system mmu04612
## mmu04640                       Immune system mmu04640
##                                       Description GeneRatio  BgRatio RichFactor
## mmu05330                      Allograft rejection    11/477  13/5350  0.8461538
## mmu05332                Graft-versus-host disease    11/477  13/5350  0.8461538
## mmu04940                 Type I diabetes mellitus    12/477  16/5350  0.7500000
## mmu05320               Autoimmune thyroid disease    11/477  14/5350  0.7857143
## mmu04514 Cell adhesion molecule (CAM) interaction    21/477  65/5350  0.3230769
## mmu05322             Systemic lupus erythematosus    16/477  43/5350  0.3720930
## mmu04974         Protein digestion and absorption    16/477  46/5350  0.3478261
## mmu05168         Herpes simplex virus 1 infection    30/477 134/5350  0.2238806
## mmu04612      Antigen processing and presentation    15/477  42/5350  0.3571429
## mmu04640               Hematopoietic cell lineage    12/477  31/5350  0.3870968
##          FoldEnrichment   zScore       pvalue     p.adjust       qvalue
## mmu05330       9.490405 9.588458 1.680774e-10 2.655623e-08 2.273468e-08
## mmu05332       9.490405 9.588458 1.680774e-10 2.655623e-08 2.273468e-08
## mmu04940       8.411950 9.288877 2.892574e-10 3.046845e-08 2.608392e-08
## mmu05320       8.812519 9.156815 7.216607e-10 5.701119e-08 4.880705e-08
## mmu04514       3.623609 6.657798 8.563850e-08 5.412353e-06 4.633494e-06
## mmu05322       4.173370 6.536236 3.409901e-07 1.795881e-05 1.537447e-05
## mmu04974       3.901194 6.182316 9.897262e-07 4.467907e-05 3.824957e-05
## mmu05168       2.511030 5.541839 1.427357e-06 5.175259e-05 4.430518e-05
## mmu04612       4.005690 6.117876 1.473966e-06 5.175259e-05 4.430518e-05
## mmu04640       4.341651 5.837465 6.413940e-06 1.989704e-04 1.703378e-04
##                                                                                                                                                                                  geneID
## mmu05330                                                                                                             H2-M3/Fas/Cd28/H2-Q4/H2-DMa/H2-K1/H2-T23/H2-D1/Cd80/H2-T10/H2-DMb1
## mmu05332                                                                                                             H2-M3/Fas/Cd28/H2-Q4/H2-DMa/H2-K1/H2-T23/H2-D1/Cd80/H2-T10/H2-DMb1
## mmu04940                                                                                                         H2-M3/Fas/Cd28/H2-Q4/H2-DMa/Cpe/H2-K1/H2-T23/H2-D1/Cd80/H2-T10/H2-DMb1
## mmu05320                                                                                                             H2-M3/Fas/Cd28/H2-Q4/H2-DMa/H2-K1/H2-T23/H2-D1/Cd80/H2-T10/H2-DMb1
## mmu04514                                                Cldn15/Esam/H2-M3/Cd34/Vcan/Itgb8/Cd28/Nfasc/Ptprd/Ptprm/H2-Q4/H2-DMa/Cntnap2/Jam2/Cntn1/H2-K1/H2-T23/H2-D1/Cd80/H2-T10/H2-DMb1
## mmu05322                                                                                H2bc4/Macroh2a2/C3/Cd28/Trim21/H2-DMa/H2bc8/Fcgr4/H4c8/H2ac19/C4b/Cd80/H2ac25/H2-DMb1/H3c4/H3c7
## mmu04974                                                          Col18a1/Col20a1/Kcnk5/Slc3a1/Col11a2/Col3a1/Col11a1/Col15a1/Col9a2/Slc7a9/Col10a1/Atp1b2/Col27a1/Slc8a3/Kcnj13/Col4a3
## mmu05168 Irf9/H2-M3/Eif2ak2/C3/Tapbp/Tap2/Cd74/Fas/Irf7/Stat1/Sp100/Ifih1/Calr4/Pik3r3/Oas1b/Oas2/Ccl5/Ccl2/H2-Q4/Tap1/H2-DMa/Stat2/Rigi/H2-K1/Rnasel/Oas1g/H2-T23/H2-D1/H2-T10/H2-DMb1
## mmu04612                                                                                    Rfx5/H2-M3/Tapbp/Tap2/Cd74/Calr4/H2-Q4/Tap1/H2-DMa/H2-K1/H2-T23/H2-D1/H2-T10/H2-DMb1/Hspa1b
## mmu04640                                                                                                        Itga2/Cd34/Kitl/Dntt/Il6ra/Cd1d1/Cd38/Cd37/Itga2b/H2-DMa/Csf2ra/H2-DMb1
##          Count
## mmu05330    11
## mmu05332    11
## mmu04940    12
## mmu05320    11
## mmu04514    21
## mmu05322    16
## mmu04974    16
## mmu05168    30
## mmu04612    15
## mmu04640    12

# Dot plot
dotplot(kegg_up, 
        showCategory = 15,
        title = "KEGG Pathway Enrichment")

ggsave(here("demo-analysis", "output", "differential-expression", "figures", "KEGG_dotplot.png"), width = 10, height = 6, dpi = 300)

GO vs KEGG Comparison (Up and Down Separately)

# Prepare GO data (up and down)
ego_up_df <- as.data.frame(ego_up) %>%
  mutate(Database = "GO-BP", 
         Direction = "Up",
         Term = Description) %>%
  dplyr::select(Database, Direction, Term, Count, FDR = p.adjust) %>%
  head(10)

ego_down_df <- as.data.frame(ego_down) %>%
  mutate(Database = "GO-BP", 
         Direction = "Down",
         Term = Description) %>%
  dplyr::select(Database, Direction, Term, Count, FDR = p.adjust) %>%
  head(10)

# Prepare KEGG data (up and down)
kegg_up_df <- as.data.frame(kegg_up) %>%
  mutate(Database = "KEGG", 
         Direction = "Up",
         Term = Description) %>%
  dplyr::select(Database, Direction, Term, Count, FDR = p.adjust) %>%
  head(10)

kegg_down_df <- as.data.frame(kegg_down) %>%
  mutate(Database = "KEGG", 
         Direction = "Down",
         Term = Description) %>%
  dplyr::select(Database, Direction, Term, Count, FDR = p.adjust) %>%
  head(10)

# Combine all
comparison_df <- bind_rows(ego_up_df, ego_down_df, 
                           kegg_up_df, kegg_down_df)

# Plot: facet by database, color by direction
ggplot(comparison_df, aes(x = Count, y = reorder(Term, Count), fill = Direction)) +
  geom_col() +
  scale_fill_manual(values = c("Up" = "red", "Down" = "blue")) +
  labs(title = "Top Enriched Terms: GO vs KEGG (Up and Down)",
       x = "Number of Genes", 
       y = "Term",
       fill = "Direction") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 8)) +
  facet_wrap(~Database, scales = "free_y", ncol = 1)

ggsave(here(figures_dir, "GO_vs_KEGG_comparison_direction.png"), 
       width = 12, height = 10, dpi = 300)

Export Results

# ===== Export GO-BP Results =====
write.csv(as.data.frame(ego_up),
          here(results_dir, "GO_BP_upregulated.csv"),
          row.names = FALSE)

write.csv(as.data.frame(ego_down),
          here(results_dir, "GO_BP_downregulated.csv"),
          row.names = FALSE)

# ===== Export KEGG Results =====
write.csv(as.data.frame(kegg_up),
          here(results_dir, "KEGG_upregulated.csv"),
          row.names = FALSE)

write.csv(as.data.frame(kegg_down),
          here(results_dir, "KEGG_downregulated.csv"),
          row.names = FALSE)

# ===== Create Summary Table =====
pathway_summary <- data.frame(
  Database = rep(c("GO-BP", "KEGG"), each = 2),
  Direction = rep(c("Up", "Down"), 2),
  N_Terms = c(nrow(ego_up), nrow(ego_down), 
              nrow(kegg_up), nrow(kegg_down)),
  Top_Term = c(
    ego_up$Description[1],
    ego_down$Description[1],
    kegg_up$Description[1],
    kegg_down$Description[1]
  ),
  Top_FDR = formatC(c(
    ego_up$p.adjust[1],
    ego_down$p.adjust[1],
    kegg_up$p.adjust[1],
    kegg_down$p.adjust[1]
  ), format = "e", digits = 2)
)

write.csv(pathway_summary,
          here(results_dir, "pathway_summary.csv"),
          row.names = FALSE)

# ===== Print Summary =====
cat("\n=== Pathway Enrichment Results Exported ===\n\n")

## 
## === Pathway Enrichment Results Exported ===

print(pathway_summary)

##   Database Direction N_Terms              Top_Term  Top_FDR
## 1    GO-BP        Up     184 response to bacterium 2.91e-11
## 2    GO-BP      Down     182      nuclear division 1.61e-24
## 3     KEGG        Up      39   Allograft rejection 2.66e-08
## 4     KEGG      Down      10            Cell cycle 2.55e-09

cat("\nFiles saved:\n")

## 
## Files saved:

cat("  ✓ GO_BP_upregulated.csv (", nrow(ego_up), " terms)\n", sep = "")

##   ✓ GO_BP_upregulated.csv (184 terms)

cat("  ✓ GO_BP_downregulated.csv (", nrow(ego_down), " terms)\n", sep = "")

##   ✓ GO_BP_downregulated.csv (182 terms)

cat("  ✓ KEGG_upregulated.csv (", nrow(kegg_up), " terms)\n", sep = "")

##   ✓ KEGG_upregulated.csv (39 terms)

cat("  ✓ KEGG_downregulated.csv (", nrow(kegg_down), " terms)\n", sep = "")

##   ✓ KEGG_downregulated.csv (10 terms)

cat("  ✓ pathway_summary.csv\n")

##   ✓ pathway_summary.csv

cat("\nLocation:", results_dir, "\n")

## 
## Location: /blue/bioinf_workshop/hkates/rnaseq_workshop/demo-analysis/output/differential-expression/results

Session Information

sessionInfo()

## R version 4.5.3 (2026-03-11)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] here_1.0.2             patchwork_1.3.2        lubridate_1.9.5       
##  [4] forcats_1.0.1          stringr_1.6.0          dplyr_1.2.0           
##  [7] purrr_1.2.1            readr_2.2.0            tidyr_1.3.2           
## [10] tibble_3.3.1           ggplot2_4.0.2          tidyverse_2.0.0       
## [13] enrichplot_1.30.5      org.Mm.eg.db_3.21.0    AnnotationDbi_1.72.0  
## [16] IRanges_2.44.0         S4Vectors_0.48.0       Biobase_2.70.0        
## [19] BiocGenerics_0.56.0    generics_0.1.4         clusterProfiler_4.18.4
## 
## loaded via a namespace (and not attached):
##   [1] DBI_1.3.0               gson_0.1.0              rlang_1.1.7            
##   [4] magrittr_2.0.4          DOSE_4.4.0              otel_0.2.0             
##   [7] compiler_4.5.3          RSQLite_2.4.6           png_0.1-9              
##  [10] systemfonts_1.3.2       vctrs_0.7.2             reshape2_1.4.5         
##  [13] pkgconfig_2.0.3         crayon_1.5.3            fastmap_1.2.0          
##  [16] XVector_0.50.0          labeling_0.4.3          rmarkdown_2.30         
##  [19] tzdb_0.5.0              ragg_1.5.2              bit_4.6.0              
##  [22] xfun_0.57               cachem_1.1.0            aplot_0.2.9            
##  [25] jsonlite_2.0.0          blob_1.3.0              tidydr_0.0.6           
##  [28] tweenr_2.0.3            BiocParallel_1.44.0     cluster_2.1.8.2        
##  [31] parallel_4.5.3          R6_2.6.1                bslib_0.10.0           
##  [34] stringi_1.8.7           RColorBrewer_1.1-3      jquerylib_0.1.4        
##  [37] GOSemSim_2.36.0         Rcpp_1.1.1              Seqinfo_0.99.2         
##  [40] knitr_1.51              ggtangle_0.1.1          R.utils_2.13.0         
##  [43] timechange_0.4.0        Matrix_1.7-5            splines_4.5.3          
##  [46] igraph_2.2.2            tidyselect_1.2.1        qvalue_2.42.0          
##  [49] rstudioapi_0.18.0       dichromat_2.0-0.1       yaml_2.3.12            
##  [52] codetools_0.2-20        lattice_0.22-9          plyr_1.8.9             
##  [55] withr_3.0.2             treeio_1.34.0           KEGGREST_1.50.0        
##  [58] S7_0.2.1                evaluate_1.0.5          gridGraphics_0.5-1     
##  [61] polyclip_1.10-7         scatterpie_0.2.6        Biostrings_2.78.0      
##  [64] pillar_1.11.1           ggtree_4.0.5            ggfun_0.2.0            
##  [67] rprojroot_2.1.1         hms_1.1.4               scales_1.4.0           
##  [70] tidytree_0.4.7          glue_1.8.0              gdtools_0.5.0          
##  [73] lazyeval_0.2.2          tools_4.5.3             ggnewscale_0.5.2       
##  [76] data.table_1.18.2.1     fgsea_1.36.2            ggiraph_0.9.6          
##  [79] fs_2.0.1                fastmatch_1.1-8         cowplot_1.2.0          
##  [82] grid_4.5.3              ape_5.8-1               nlme_3.1-168           
##  [85] ggforce_0.5.0           cli_3.6.5               rappdirs_0.3.4         
##  [88] textshaping_1.0.5       fontBitstreamVera_0.1.1 gtable_0.3.6           
##  [91] R.methodsS3_1.8.2       yulab.utils_0.2.4       sass_0.4.10            
##  [94] digest_0.6.39           fontquiver_0.2.1        ggrepel_0.9.8          
##  [97] ggplotify_0.1.3         htmlwidgets_1.6.4       farver_2.1.2           
## [100] memoise_2.0.1           htmltools_0.5.9         R.oo_1.27.1            
## [103] lifecycle_1.0.5         httr_1.4.8              GO.db_3.22.0           
## [106] fontLiberation_0.1.0    bit64_4.6.0-1           MASS_7.3-65