Key dates
- May 5, 2026 - The complete Submission is due.
- May 8, 2026 - The Peer-review assessment is due.
- May 12, 2026 - Final grades are due by noon, VCU
requirement.
Overview
The purpose of the final project is to gain hands-on experience with
the full spectrum of single-cell RNA-seq analysis
methods applied to real-world data. This project is designed to
help you strengthen both your statistical and
practical understanding of scRNA-seq data analysis.
Your goal is to perform a complete and reproducible
analysis and interpretation of an scRNA-seq
dataset. The project should include all key analytical steps — raw data
retrieval, processing, quality control, normalization, dimensionality
reduction, clustering, differential expression, and functional
enrichment analysis — supported by appropriate visualizations.
Dataset Selection
You must select an scRNA-seq dataset for analysis.
Dataset Requirements:
- At least two experimental conditions (e.g., cancer
vs. normal, treated vs. untreated, or disease vs. healthy). If a dataset
has more than two conditions, use a subset.
- Preferably human data; however, model
organism datasets are acceptable.
- 10x Genomics scRNA-seq data, FASTQ files are available.
Search PubMed for your
condition of interest (Example
search) . See Single-Cell
RNA-seq Resources for potential data sources. Note it is almost
always advisable to look at the original paper for Gene Expression
Omnibus (GEO) accession
numbers. Read the publication associated with your chosen dataset to
understand its biological and experimental context.

Source: https://www.linkedin.com/posts/ehsan-saghapour_rnaseq-scrnaseq-bioinformatics-activity-7449683022028697600-dPlj
Project Organization
- On your local computer, create a dedicated project
folder to store all scripts, data, and results.
- Add a
README.md file describing each script, its input,
and its output.
- Create a
manuscript.Rmd file containing your project
report written in R Markdown format.
The report should be knitted as a Word and
HTML dobument.
- Follow the IMRaD
structure (Introduction, Methods, Results, and Discussion).
- Include BibTeX references where relevant.
- Provide supplementary materials containing:
- Differential expression results
- Functional enrichment results
- The main text (excluding references, tables, and figure legends)
should not exceed 3,000 words.
Reporting
1. Introduction / Background
- Summarize the research question and
biological context of the study.
2. Methods
- Retrieve raw FASTQ files from GEO/SRA.
- Process reads using either Cell Ranger or nf-core/scrnaseq (High
Performance Computing).
- Import processed data into R using Seurat, include metadata (sample,
condition, replicate).
- Filter low-quality cells using total UMIs (nCount), number of
detected genes (nFeature), and % mitochondrial reads.
- Normalize counts (e.g., log-normalization or SCTransform).
- Identify highly variable genes to capture biological signal while
reducing noise.
- Perform PCA for initial reduction.
- Use nonlinear embedding (UMAP/t-SNE) for visualization.
- Construct a nearest-neighbor graph and perform graph-based
clustering (e.g., Louvain/Leiden). Explore resolution parameters to
control cluster granularity.
- Identify cluster-specific marker genes using differential expression
(e.g., Wilcoxon rank-sum test).
- Annotate clusters based on canonical markers and/or reference
datasets. Merge or refine clusters where biologically appropriate.
- Within each annotated cell type, perform differential expression
analysis.
- Perform functional enrichment results and compare pathways affected
by differentially expressed genes in each cell type.
- Compare your results with the original publication.
3. Results
- Describe your findings, presented by figures and
tables .
4. Discussion / Conclusion
- Discuss how your analysis and findings differ from the original
publication.
- Address potential limitations.
5. References
- Use BibTeX-style references consistent with the R Markdown
bibliography format.
6. Computational Component
- Create a GitHub repository and include HPC scripts and R code
necessary to reproduce results.
- Code should be well-commented and formatted for
readability (use consistent indentation and spacing).
Submission Instructions
- Add the GitHub link to your manuscript file and
push all scripts and data to your GitHub
repository.
- Submit the knitted HTML manuscript to
Canvas.
Peer Review Process
- After submission, you will be assigned to review one peer’s
project.
- The goal is to learn from others’ analyses.
- Instructions for peer review:
- The peer-to-peer assignment will be distributed via
Canvas.
- Clone your peer’s repository and
knit their final project document.
- Evaluate each section (Introduction, Methods, Results, Discussion,
etc.) and rate as:
- Pass, Fail, or
Marginal, with brief justification.
- Submit your assessment via Canvas on or before
May 8, 2026.
Grading and Deadlines
- The instructor will formally grade all projects, considering peer
assessments.
- Final course grades will be entered in the system
on or before May 12, 2026.
Summary of Deliverables
- GitHub repository containing:
- All analysis scripts, including data download scripts.
README.md and manuscript.Rmd
- HTML report (
manuscript.html)
- Peer review submission