Final Project Assignment: single-cell RNA-seq Data Analysis

Key dates

May 5, 2026 - The complete Submission is due.
May 8, 2026 - The Peer-review assessment is due.
May 12, 2026 - Final grades are due by noon, VCU requirement.

Overview

The purpose of the final project is to gain hands-on experience with the full spectrum of single-cell RNA-seq analysis methods applied to real-world data. This project is designed to help you strengthen both your statistical and practical understanding of scRNA-seq data analysis.

Your goal is to perform a complete and reproducible analysis and interpretation of an scRNA-seq dataset. The project should include all key analytical steps — raw data retrieval, processing, quality control, normalization, dimensionality reduction, clustering, differential expression, and functional enrichment analysis — supported by appropriate visualizations.

Dataset Selection

You must select an scRNA-seq dataset for analysis. Dataset Requirements:

At least two experimental conditions (e.g., cancer vs. normal, treated vs. untreated, or disease vs. healthy). If a dataset has more than two conditions, use a subset.
Preferably human data; however, model organism datasets are acceptable.
10x Genomics scRNA-seq data, FASTQ files are available.

Search PubMed for your condition of interest (Example search) . See Single-Cell RNA-seq Resources for potential data sources. Note it is almost always advisable to look at the original paper for Gene Expression Omnibus (GEO) accession numbers. Read the publication associated with your chosen dataset to understand its biological and experimental context.

Source: https://www.linkedin.com/posts/ehsan-saghapour_rnaseq-scrnaseq-bioinformatics-activity-7449683022028697600-dPlj

Project Organization

On your local computer, create a dedicated project folder to store all scripts, data, and results.
Add a README.md file describing each script, its input, and its output.
Create a manuscript.Rmd file containing your project report written in R Markdown format.
The report should be knitted as a Word and HTML dobument.
Follow the IMRaD structure (Introduction, Methods, Results, and Discussion).
Include BibTeX references where relevant.
Provide supplementary materials containing:
- Differential expression results
- Functional enrichment results
The main text (excluding references, tables, and figure legends) should not exceed 3,000 words.

Reporting

1. Introduction / Background

Summarize the research question and biological context of the study.

2. Methods

Retrieve raw FASTQ files from GEO/SRA.
Process reads using either Cell Ranger or nf-core/scrnaseq (High Performance Computing).
Import processed data into R using Seurat, include metadata (sample, condition, replicate).
Filter low-quality cells using total UMIs (nCount), number of detected genes (nFeature), and % mitochondrial reads.
Normalize counts (e.g., log-normalization or SCTransform).
Identify highly variable genes to capture biological signal while reducing noise.
Perform PCA for initial reduction.
Use nonlinear embedding (UMAP/t-SNE) for visualization.
Construct a nearest-neighbor graph and perform graph-based clustering (e.g., Louvain/Leiden). Explore resolution parameters to control cluster granularity.
Identify cluster-specific marker genes using differential expression (e.g., Wilcoxon rank-sum test).
Annotate clusters based on canonical markers and/or reference datasets. Merge or refine clusters where biologically appropriate.
Within each annotated cell type, perform differential expression analysis.
Perform functional enrichment results and compare pathways affected by differentially expressed genes in each cell type.
Compare your results with the original publication.

3. Results

Describe your findings, presented by figures and tables .

4. Discussion / Conclusion

Discuss how your analysis and findings differ from the original publication.
Address potential limitations.

5. References

Use BibTeX-style references consistent with the R Markdown bibliography format.

6. Computational Component

Create a GitHub repository and include HPC scripts and R code necessary to reproduce results.
Code should be well-commented and formatted for readability (use consistent indentation and spacing).

Submission Instructions

Add the GitHub link to your manuscript file and push all scripts and data to your GitHub repository.
Submit the knitted HTML manuscript to Canvas.

Peer Review Process

After submission, you will be assigned to review one peer’s project.
- The goal is to learn from others’ analyses.
Instructions for peer review:
- The peer-to-peer assignment will be distributed via Canvas.
- Clone your peer’s repository and knit their final project document.
- Evaluate each section (Introduction, Methods, Results, Discussion, etc.) and rate as:
  - Pass, Fail, or Marginal, with brief justification.
Submit your assessment via Canvas on or before May 8, 2026.

Grading and Deadlines

The instructor will formally grade all projects, considering peer assessments.
Final course grades will be entered in the system on or before May 12, 2026.

Summary of Deliverables

GitHub repository containing:
- All analysis scripts, including data download scripts.
- README.md and manuscript.Rmd
HTML report (manuscript.html)
Peer review submission