Best Practices for Interpreting Omics Data with Pathway Enrichment Analysis

Dr. Heather Kates and Kalyanee Shirlekar

Why This Topic Matters

Pathway enrichment is powerful — and deceptively simple.


✓ A go-to method
For interpreting high-dimensional omics data

✓ Widely used in cancer research
To extract biological meaning from gene lists


But beware:
- Easy to misuse and misreport with unclear inputs and online tools
- Overwhelming: many options for databases, methods, and cutoffs

Outline

1. What is pathway enrichment analysis (PEA)? 

2. Overview of methods: ORA, GSEA, Topology 

3. Running a PEA: Test, Input, Background, Databases, Online Tools

4. Common pitfalls & tips for better PEA 

5. Visualization and reproducibility 

6. Open Questions

Main Sources:

Interpreting omics data with pathway enrichment analysis

Nine quick tips for pathway enrichment analysis

What Is Pathway Enrichment Analysis?

  • Statistical approach to identify biological pathways whose genes show non-random patterns in an omics dataset

  • Inputs: either a gene list or a full ranked list of features from an omics dataset

  • Outputs: enriched biological processes or pathways

  • Common in web tools: gProfiler, Enrichr, DAVID, Reactome, ExpressAnalyst etc.

Methods of Pathway Enrichment Analysis

1. Over-representation analysis (ORA):
- Uses a gene list and a background.
- Compares observed vs. expected overlap.

2. Gene Set Enrichment Analysis (GSEA):
- Uses a full ranked list of genes.
- Captures subtle, coordinated effects.

3. Topology-based analysis:
- Uses gene–gene relationships in pathway networks.

The Web Tool Black Box Problem

100 random unrelated genes

Visit Enrichr

Over-representation analysis (ORA):

Examine whether any pathways are observed in a gene list of interest more than expected by chance compared with a background set

  • Uses DEGs and a background
  • Compares observed vs expected overlap
  • P-values are adjusted for multiple-testing!

Gene Set Enrichment Analysis (GSEA):

First rank the total gene set on the basis of detected signals, such as change of gene expression, then tests whether genes annotated to the same pathway tend to cluster together at the top (or bottom) of the ranked list.

  • Uses a full ranked list of genes
  • Captures subtle, coordinated effects
  • P-values are adjusted for multiple-testing!

Topology-based analysis:

Account for additional information that impacts pathway activity by integrating scores measuring gene positions within a pathway and gene–gene interactions into the enrichment tests.

  • Aim to increase the sensitivity of pathway enrichment analysis by considering genes’ “co-expression”

  • Requires experimental evidence for pathway structures and gene–gene interactions

General Workflow

  • Start with omics data (e.g., RNA-seq, ATAC-seq, etc.)
  • Select statistical method (ORA, GSEA, TPEA)
  • Choose your input (e.g. gene list)
  • Choose annotation database (e.g., GO, KEGG, Reactome)
  • Perform PEA and visualize
  • Document all assumptions and parameters

The Importance of Input and Background Sets

  • Input set: filtered list of genes, proteins, or metabolites
  • Background set: all features detected in the experiment
  • ❗ Using all genes in the genome as background = common mistake

Example: Why Background Matters

  • RNA-seq study using genome-wide vs. expressed-gene background
  • Only 44% overlap in enriched pathways
  • Use actual detected features as background!

Reference Annotation Databases

  • KEGG, Reactome, GO, MetaCyc, EcoCyc, etc.
  • Pathway size and definitions vary between databases
  • Use the most updated versions
  • Report: database, version, date

Web Tool Summary Table

Tool ORA GSEA Topology GO KEGG Reactome MSigDB Other Databases
g:Profiler 🔶 TRANSFAC, miRTarBase, WikiPathways
Enrichr ChEA, DrugMatrix, TF/miRNA
DAVID Panther, BioCarta
WebGestalt WikiPathways, user-defined sets
Reactome
PantherDB Panther Pathways
Metascape CORUM, WikiPathways
ShinyGO 🔸 Limited subset of MSigDB
PathDIP PID, BioCarta, PPI-aware pathways
GSEA-MSigDB Hallmark, C1–C7 collections
ExpressAnalyst BioCarta, WikiPathways
Cytoscape EnrichMap Any Any Any Any Visualization

9 Quick Tips to Avoid PEA Pitfalls

  1. Know what analysis type fits your data
  2. Clean, validated input gene list only
  3. Use >1 PEA tool, compare results
  4. Document all tool versions & parameters
  5. Always use adjusted p-values
  6. Choose statistical tests & visualizations wisely
  7. Consider analyzing gene subgroups/networks
  8. Validate with recent literature
  9. Review with a wet-lab biologist or clinician

Common Errors in Published PEA Studies

  • 90% of surveyed studies used incorrect background sets
  • 16 of 25 popular tools used outdated databases
  • Up to 40% more false positives without multiple testing correction

Visualizing Your Results

Key plot labels:

  • ORA (dotplot): Gene ratio (x-axis), count (point size), adjusted p-value (color)
  • GSEA (ridgeplot): logFC (x-axis), term (y-axis), adjusted p-value (color)

Reporting is Key

  • The result of pathway enrichment depends on the data, assumptions, and tools you use.
  • Therefore, these must be reported in your presentations and publications:
    • Gene list origin and filtering criteria
    • Background set definition
    • Tool name, version, database, and date
    • Statistical method, FDR threshold

Integrated Omics

  • Combines transcriptomics, proteomics, GWAS, etc.
  • Tool: ActivePathways (combines p-values from each omics layer)
  • Pros: improves power
  • Cons: complex, needs careful integration

Final Thoughts

  • Pathway enrichment is powerful if done right
  • Many omics types require special considerations
  • Don’t rely on defaults or black-box tools
  • Collaboration between biologists and bioinformaticians is key

Open Questions

  • What are the best ways to integrate multiple data layers?
  • What makes a good PEA tool for non-programmers?

:::::::::::::