Pathway enrichment is powerful — and deceptively simple.
✓ A go-to method
For interpreting high-dimensional omics data
✓ Widely used in cancer research
To extract biological meaning from gene lists
❗ But beware:
- Easy to misuse and misreport with unclear inputs and online tools
- Overwhelming: many options for databases, methods, and cutoffs
1. What is pathway enrichment analysis (PEA)?
2. Overview of methods: ORA, GSEA, Topology
3. Running a PEA: Test, Input, Background, Databases, Online Tools
4. Common pitfalls & tips for better PEA
5. Visualization and reproducibility
6. Open Questions
Main Sources:
Statistical approach to identify biological pathways whose genes show non-random patterns in an omics dataset
Inputs: either a gene list or a full ranked list of features from an omics dataset
Outputs: enriched biological processes or pathways
Common in web tools: gProfiler, Enrichr, DAVID, Reactome, ExpressAnalyst etc.
1. Over-representation analysis (ORA):
- Uses a gene list and a background.
- Compares observed vs. expected overlap.
2. Gene Set Enrichment Analysis (GSEA):
- Uses a full ranked list of genes.
- Captures subtle, coordinated effects.
3. Topology-based analysis:
- Uses gene–gene relationships in pathway networks.
100 random unrelated genes
Examine whether any pathways are observed in a gene list of interest more than expected by chance compared with a background set
First rank the total gene set on the basis of detected signals, such as change of gene expression, then tests whether genes annotated to the same pathway tend to cluster together at the top (or bottom) of the ranked list.
Account for additional information that impacts pathway activity by integrating scores measuring gene positions within a pathway and gene–gene interactions into the enrichment tests.
Aim to increase the sensitivity of pathway enrichment analysis by considering genes’ “co-expression”
Requires experimental evidence for pathway structures and gene–gene interactions
Web Tool Summary Table
Tool | ORA | GSEA | Topology | GO | KEGG | Reactome | MSigDB | Other Databases |
---|---|---|---|---|---|---|---|---|
g:Profiler | ✅ | 🔶 | ✅ | ✅ | ✅ | ✅ | TRANSFAC, miRTarBase, WikiPathways | |
Enrichr | ✅ | ✅ | ✅ | ✅ | ✅ | ChEA, DrugMatrix, TF/miRNA | ||
DAVID | ✅ | ✅ | ✅ | ✅ | Panther, BioCarta | |||
WebGestalt | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | WikiPathways, user-defined sets |
Reactome | ✅ | ✅ | ✅ | |||||
PantherDB | ✅ | ✅ | ✅ | Panther Pathways | ||||
Metascape | ✅ | ✅ | ✅ | ✅ | ✅ | CORUM, WikiPathways | ||
ShinyGO | ✅ | ✅ | ✅ | 🔸 | Limited subset of MSigDB | |||
PathDIP | ✅ | ✅ | ✅ | ✅ | ✅ | PID, BioCarta, PPI-aware pathways | ||
GSEA-MSigDB | ✅ | ✅ | ✅ | ✅ | ✅ | Hallmark, C1–C7 collections | ||
ExpressAnalyst | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | BioCarta, WikiPathways | |
Cytoscape EnrichMap | ✅ | Any | Any | Any | Any | Visualization |
Key plot labels:
:::::::::::::