1 Project Summary

PI: Dr. Blanka Sharma

Institution: University of Florida

Department: Biomedical Engineering

Study Contact: Suzanne Lightsey

Project Title: PEG-Hydrogel vs In vitro / In vivo Samples

Study Summary:To compare the gene expression profile of our in vitro PEG-based tumor model to traditional tumor models of increasing complexity (2D and subcutaneous)

Sample type(s): Human lung cancer cells (A549 Luciferase expressing cells), 2D cell culture, 3D PEG-based tumor, subcutaneous xenograft, orthotopic xenograft.

Organism: Human (the immortalized cell line is derived from a 58-year-old white male; purchased from ATCC)

Analysis goal(s): This analysis is a follow-up to a previous analysis. For this analysis, “We want to benchmark our PEG-based hydrogel tumor model against traditional in vitro and in vivo models. Therefore, we want to compare the gene expression profile of the PEG based tumor model against in vivo subcutaneous tumor (control) and 2D cell culture, and additional compare 2D cell culture against in vivo subcutaneous tumor (control).

Report-prepared-by: - Dr. Heather Kates, Bioinformatics Analyst

Report-reviewed-by: - Dr. Jason Brant, Assistant Professor

2 Data Downloads

2.1 Download Raw Sequencing Data

Below is a link to download the raw sequencing files. These files are very large (>150GB); download only when needed. Note that you must be logged into your UF dropbox account for this link to work.

Download Raw Sequence Files

2.2 Download Sequencing Data Quality Control Summary

FastQC provides quality metrics for individual sequencing samples, including read quality, GC content, and adapter contamination. The MultiQC report summarizes these results across all samples. The multiQC report is of raw sequence data. These data were quality filtered and trimmed prior to analysis. For guidance on interpreting the results, see FastQC documentation.

Download MultiQC Report

2.3 Download Raw and Filtered Counts Data

Description of Data Sheets:

  1. Raw: Raw gene count matrix before filtering or normalization.
  2. CPM: Counts per million (CPM) normalized expression values, with gene symbols.
  3. LCPM: Log-transformed CPM values, useful for visualization, with gene symbols.
  4. Filtered: Gene expression counts filtered to remove lowly-expressed and non-variable genes using the NOISeq method. This dataset was used as input for DE analysis
  5. Sample_Metadata: Sample information including treatment, group, and library sizes.

The Excel file containing these datasets can be downloaded using the link provided below.

3 Data Visualizations

3.1 PCA (multivariate analaysis)

Principal component analysis of gene expression data were calculated in R v4.3.3. Two-dimensional PCA score plots reveal possible separation in gene expression profiles and can help to identify quality control issues and batch effects. Ellipses are calculated using the R package car (Fox J. and Weisberg S. 2019) and ~1 Std dev.

3.2 Global comparison of transcriptional profile similarity between groups

3.2.1 Spearman correlation

Pairwise sample-to-sample correlation coefficients of log CPM filtered counts were calculated using Spearman’s rank correlation in R v4.3.3 and visualized using pheatmap 1.0 (Kolde 2019).

3.2.2 Heatmap of top 1000 CV genes

Scaled gene expression values of the top 1000 variable genes (sd/mean) were visualized using pheatmap 1.0 (Kolde 2019) with sample and gene clustered by complete hierarchical clustering.

3.3 Mean-Variance Relationship: Before and After Variance Moderation

These mean-variance trend plots are provided to illustrate how the DE analysis ensure differential expression (DE) results are not biased by expression-dependent variance. The top plot shows how variance changes with expression level in the raw data. Typically, variance decreases as expression increases. This confirms that voom() correctly models heteroscedasticity (unequal variance) in RNA-seq data. The Bottom plot shows variance after empirical Bayes shrinkage using eBayes(). The trend should be flat, indicating variance is properly stabilized.

4 Differential Expression Analysis

Differential expression analysis was performed using limma v3.5 (Ritchie et al. 2015). Raw RNA-seq counts were filtered using NOISeq v2.4 (Tarazona et al., 2011) to remove features that have an average expression per condition less than 1 cpm a coefficient of variation > 100% in all the conditions. To account for differences in sequencing depth and composition bias, TMM normalization as implemented in edgeR v3.4 (Robinson et al. 2010) was used to calculate normalization factors that were used during voom transformation. A linear model was fitted to the transformed data, and pairwise comparisons between conditions were assessed by computing moderated t-statistics, moderated F-statistic, and log-odds of differential expression with empirical Bayes moderation of the standard errors. Statistical significance was assessed using adjusted p-values to control the false discovery rate (FDR).

Three contrasts were tested:

  1. 3D PEG hydrogel model vs. in vivo subcutaneous tumor control

  2. 2D cell culture vs. in vivo subcutaneous tumor control

  3. 3D PEG hydrogel model vs. 2D cell culture

Down-regulated: adj. p value < 0.05 *AND logFC > 0.58

Up-regulated: adj. p value < 0.05 *AND logFC < -0.58

Not Significant: adj. p value > 0.05 *AND/OR abs(logFC) < 0.58

4.1 Summary of DE results per contrast

4.2 Interactive Downloadable Differential Expression Results Tables

  • Summary per contrast showing the number of differentially expressed genes and key statistics.
  • Tables can be filtered, sorted, and downloaded

4.2.1 Contrast 1: 3D PEG model vs. in vivo subcutaneous tumor (control)

4.2.2 Contrast 2: 2D cell culture vs. in vivo subcutaneous tumor (control)

4.2.3 Contrast 3: 3D PEG model vs. 2D cell culture

4.3 Differential Expression Results Visualizations

4.3.1 Volcano Plots

Visualization of log-fold changes vs. statistical significance for each contrast (plotly v4.1 (Sievert C 2020)) shows patterns of significantly up/downregulated genes (“significance” determined by adj. p value < 0.05 *AND abs(logFC) > 0.58).

4.3.2 Heatmaps

Heatmaps of the top 50 DE genes for each contrast show the scaled counts values for all samples in that contrast using heatmaply v1.5 (Galili et al. 2017). Hierarchical clustering of samples is based on expression patterns.

5 Pathway Enrichment Analysis

Pathway enrichment analysis was performed to identify functional enrichment of gene lists and to compare these significant results across contrasts.

5.1 Gene Ontology (GO) Enrichment Analysis

Gene Ontology enrichment was performed using the enrichGO function in clusterProfiler v4.8 (Yu et al. 2012) in each of three GO categories (BP, MF, CC). Interactive GO enrichment plots are based on the top 10 significantly enriched GO terms per GO category per gene list. Hover over the plot to view p-value, gene ratio, and up to the top 20 DE genes (sorted by DE adj.p.value) in that term.

To assess similiarities between gene lists’ enrichment results, if a gene list(s) had significant results for a different gene lists’ top-10 term, that result is displayed as well regardless of whether or not the result was in top 10. Downloadable results excel file include all significant results for all gene lists.

5.1.1 DE genes

Gene lists are significantly DE genes per contrast

📥 Download GO_BP_enrich_plot
📥 Download GO_MF_enrich_plot
📥 Download GO_CC_enrich_plot

5.1.2 Non-DE genes

Gene lists are non-significantly DE genes per contrast

📥 Download GO_BP_nonDE_enrich_plot
📥 Download GO_MF_nonDE_enrich_plot
📥 Download GO_CC_nonDE_enrich_plot

5.2 KEGG Pathway Enrichment Analysis

KEGG enrichment was performed using the enrichKEGG function in clusterProfiler v4.8 (Yu et al. 2012). Interactive KEGG enrichment plots are based on the top 10 significantly enriched KEGG pathways per gene list. Hover over the plot to view p-value, gene ratio, and up to the top 20 DE genes (sorted by DE adj.p.value) in that pathway.

To assess similiarities between gene lists’ enrichment results, if a gene list(s) had significant results for a different gene lists’ top-10 pathway, that result is displayed as well regardless of whether or not the result was in top 10. Downloadable results excel file include all significant results for all gene lists.

5.2.1 DE genes

Gene lists are significantly DE genes per contrast

📥 Download kegg_enrichment_plot

5.2.2 Non-DE genes

Gene lists are non-significantly DE genes per contrast

📥 Download kegg_nonDE_enrichment_plot

6 References

  • Robinson MD, McCarthy DJ and Smyth GK (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-140
  • Tarazona S, García-Alcalde F, Dopazo J, Ferrer A, Conesa A (2011). Differential expression in RNA-seq: a matter of depth. Genome Research, 21(12), 2213-2223.
  • Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015). “limma powers differential expression analyses for RNA-sequencing and microarray studies.” Nucleic Acids Research, 43(7), e47. doi:10.1093/nar/gkv007.
  • Yu G, Wang L, Han Y, He Q (2012). “clusterProfiler: an R package for comparing biological themes among gene clusters.” OMICS: A Journal of Integrative Biology, 16(5), 284-287. doi:10.1089/omi.2011.0118.
  • R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Kolde R (2019). pheatmap: Pretty Heatmaps. R package version 1.0.12, https://CRAN.R-project.org/package=pheatmap.
  • Galili, Tal, O’Callaghan, Alan, Sidi, Jonathan, Sievert, Carson (2017). “heatmaply: an R package for creating interactive cluster heatmaps for online publishing.” Bioinformatics.
  • Sievert C (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC. ISBN 9781138331457, https://plotly-r.com.