Seurat - Guided Clustering Tutorial

Overview

This tutorial walks through a standard Seurat scRNA-seq workflow using the PBMC 3k dataset from 10x Genomics. We cover:

Quality control and filtering
Normalization (LogNormalize and SCTransform)
Feature selection
Dimensionality reduction
Graph-based clustering
Marker identification and annotation

Reference: https://satijalab.org/seurat/articles/pbmc3k_tutorial

Load Libraries

library(dplyr)      # Data manipulation
library(Seurat)     # Single-cell analysis toolkit
library(patchwork)  # Combine plots

1. Data Import and Seurat Object Creation

setwd("/Users/mdozmorov/Documents/Work/Teaching/BIOS668.2026/static/slides/10_single_cell/")

# Read 10x Genomics data
# Read10X() expects folder with: barcodes.tsv, genes.tsv, matrix.mtx
pbmc.data <- Read10X(data.dir = "filtered_gene_bc_matrices/hg19/")

# Examine structure of raw data
# Sparse matrix format: '.' represents zeros
pbmc.data[c("CD3D", "TCL1A", "MS4A1"), 1:30]

## 3 x 30 sparse Matrix of class "dgCMatrix"

##                                                                    
## CD3D  4 . 10 . . 1 2 3 1 . . 2 7 1 . . 1 3 . 2  3 . . . . . 3 4 1 5
## TCL1A . .  . . . . . . 1 . . . . . . . . . . .  . 1 . . . . . . . .
## MS4A1 . 6  . . . . . . 1 1 1 . . . . . . . . . 36 1 2 . . 2 . . . .

# Create Seurat object with QC filters
# min.cells = 3: keep genes detected in ≥3 cells
# min.features = 200: keep cells with ≥200 detected genes
pbmc <- CreateSeuratObject(
  counts = pbmc.data, 
  project = "pbmc3k", 
  min.cells = 3, 
  min.features = 200
)

# Inspect object structure
pbmc

## An object of class Seurat 
## 13714 features across 2700 samples within 1 assay 
## Active assay: RNA (13714 features, 0 variable features)
##  1 layer present: counts

# Output: 13,714 features (genes) x 2,700 samples (cells)

We retain genes expressed in ≥3 cells and cells with ≥200 detected genes to remove extremely sparse entries.

2. Quality Control (QC) and Filtering

# Calculate mitochondrial gene percentage
# Pattern "^MT-" identifies human mitochondrial genes (use "^mt-" for mouse)
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")

# View QC metrics stored in metadata
head(pbmc@meta.data, 5)

##                  orig.ident nCount_RNA nFeature_RNA percent.mt
## AAACATACAACCAC-1     pbmc3k       2419          779  3.0177759
## AAACATTGAGCTAC-1     pbmc3k       4903         1352  3.7935958
## AAACATTGATCAGC-1     pbmc3k       3147         1129  0.8897363
## AAACCGTGCTTCCG-1     pbmc3k       2639          960  1.7430845
## AAACCGTGTATGCG-1     pbmc3k        980          521  1.2244898

# Columns: orig.ident, nCount_RNA, nFeature_RNA, percent.mt

# Visualize QC metrics with violin plots
# nFeature_RNA: number of genes detected per cell
# nCount_RNA: total UMI count per cell
# percent.mt: percentage of mitochondrial reads
VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)

# Visualize feature-feature relationships
# Helps identify doublets (high gene count + high UMI count)
# and dying cells (high %MT + normal gene count)
plot1 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "percent.mt")
plot2 <- FeatureScatter(pbmc, feature1 = "nCount_RNA", feature2 = "nFeature_RNA")
plot1 + plot2

# Filter cells based on QC metrics
# Keep cells with: 200 < genes < 2500, and %MT < 5%
pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5)

High mitochondrial % → dying cells
High gene/UMI counts → potential doublets
Thresholds are dataset-dependent

3. Normalization

Option A: LogNormalize (classical workflow)

pbmc <- NormalizeData(
  pbmc, 
  normalization.method = "LogNormalize", 
  scale.factor = 10000
)

Applies log-transformation after scaling counts to counts-per-10K.

Option B: SCTransform (recommended modern workflow)

pbmc <- SCTransform(
  pbmc,
  vars.to.regress = "percent.mt",
  verbose = FALSE
)

Uses regularized negative binomial regression
Removes sequencing depth effects
Stabilizes variance
Replaces NormalizeData + ScaleData + FindVariableFeatures

4. Highly Variable Feature Selection

# Only needed if NOT using SCTransform
# Identify genes with high cell-to-cell variation
# These drive biological heterogeneity (cell types, states)
# VST method: variance-stabilizing transformation
pbmc <- FindVariableFeatures(
  pbmc, 
  selection.method = "vst", 
  nfeatures = 2000
)

# Identify top 10 most variable genes
top10 <- head(VariableFeatures(pbmc), 10)
top10

##  [1] "PPBP"   "LYZ"    "S100A9" "IGLL5"  "GNLY"   "FTL"    "PF4"    "FTH1"  
##  [9] "GNG11"  "S100A8"

# Visualize variable feature selection
# X-axis: average expression, Y-axis: standardized variance
plot1 <- VariableFeaturePlot(pbmc)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
plot1 + plot2

5. Scaling

# Scale all genes (for visualization purposes)
# ScaleData: centers to mean=0, scales to variance=1
all.genes <- rownames(pbmc)
pbmc <- ScaleData(pbmc, features = all.genes)
# Scaled data stored in pbmc[["RNA"]]$scale.data

# Optional: regress out unwanted variation
# Example: remove mitochondrial percentage effects
# pbmc <- ScaleData(pbmc, vars.to.regress = "percent.mt")

Centers and scales gene expression (mean = 0, variance = 1).

6. Principal Component Analysis (PCA)

# Run PCA on variable features
# Reduces 2,000 genes → ~50 PCs (metafeatures)
pbmc <- RunPCA(pbmc, features = VariableFeatures(object = pbmc))

# Examine PCA results
# Positive loadings: genes driving PC in positive direction
# Negative loadings: genes driving PC in negative direction
print(pbmc[["pca"]], dims = 1:5, nfeatures = 5)

## PC_ 1 
## Positive:  CST3, TYROBP, LST1, AIF1, FTL 
## Negative:  MALAT1, LTB, IL32, IL7R, CD2 
## PC_ 2 
## Positive:  CD79A, MS4A1, TCL1A, HLA-DQA1, HLA-DQB1 
## Negative:  NKG7, PRF1, CST7, GZMB, GZMA 
## PC_ 3 
## Positive:  HLA-DQA1, CD79A, CD79B, HLA-DQB1, HLA-DPB1 
## Negative:  PPBP, PF4, SDPR, SPARC, GNG11 
## PC_ 4 
## Positive:  HLA-DQA1, CD79B, CD79A, MS4A1, HLA-DQB1 
## Negative:  VIM, IL7R, S100A6, IL32, S100A8 
## PC_ 5 
## Positive:  GZMB, S100A8, NKG7, FGFBP2, GNLY 
## Negative:  LTB, IL7R, CKB, VIM, MS4A7

# Visualize PCA loadings
# Shows which genes contribute most to each PC
VizDimLoadings(pbmc, dims = 1:2, reduction = "pca")

# Plot cells in PC space
# Each point = one cell, colored by cluster (after clustering)
DimPlot(pbmc, reduction = "pca")

# Heatmap of PC gene loadings
# Visualize top genes and cells for each PC
DimHeatmap(pbmc, dims = 1, cells = 500, balanced = TRUE)

DimHeatmap(pbmc, dims = 1:15, cells = 500, balanced = TRUE)

7. Determine Dimensionality

# Elbow plot: variance explained by each PC
# Look for "elbow" where curve flattens (~PC 9-10 here)
ElbowPlot(pbmc)

# Decision: Use first 10 PCs for downstream analysis
# Conservative approach: including extra PCs rarely hurts
# Too few PCs: lose biological signal

Select PCs at the “elbow” (typically ~10 for PBMC 3k).

8. Graph-Based Clustering

# Construct K-nearest neighbor (KNN) graph
# Uses Euclidean distance in PCA space (first 10 PCs)
pbmc <- FindNeighbors(pbmc, dims = 1:10)

# Cluster cells using Louvain algorithm
# Resolution controls granularity: 0.4-1.2 typical for ~3K cells
# Higher resolution → more clusters
pbmc <- FindClusters(pbmc, resolution = 0.5)

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 2638
## Number of edges: 95840
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8726
## Number of communities: 9
## Elapsed time: 0 seconds

# View cluster assignments
# Stored in Idents(pbmc) and pbmc@meta.data$seurat_clusters
head(Idents(pbmc), 5)

## AAACATACAACCAC-1 AAACATTGAGCTAC-1 AAACATTGATCAGC-1 AAACCGTGCTTCCG-1 
##                2                3                2                1 
## AAACCGTGTATGCG-1 
##                6 
## Levels: 0 1 2 3 4 5 6 7 8

KNN graph captures local structure
Louvain partitions graph into communities
Resolution controls cluster granularity

9. UMAP Visualization

# Run UMAP for visualization
# Projects high-dimensional PCA space → 2D
# Preserves local and some global structure
pbmc <- RunUMAP(pbmc, dims = 1:10)

# Visualize clusters on UMAP
# Each color = one cluster from graph-based clustering
DimPlot(pbmc, reduction = "umap", label = TRUE)

# Alternative: t-SNE (less commonly used now)
pbmc <- RunTSNE(pbmc, dims = 1:10)
DimPlot(pbmc, reduction = "tsne")

10. Save Intermediate Object

saveRDS(pbmc, file = "pbmc_tutorial.rds")
# pbmc <- readRDS("pbmc_tutorial.rds")

11. Differential Expression (Marker Genes)

# Find markers for cluster 2 vs. all other cells
cluster2.markers <- FindMarkers(pbmc, ident.1 = 2, min.pct = 0.25)
head(cluster2.markers, n = 5)

##             p_val avg_log2FC pct.1 pct.2    p_val_adj
## IL32 7.705178e-91   1.322332 0.947 0.466 1.056688e-86
## LTB  2.582056e-85   1.326273 0.981 0.643 3.541032e-81
## CD3D 2.722666e-70   1.055704 0.922 0.432 3.733865e-66
## IL7R 9.563577e-68   1.434513 0.751 0.327 1.311549e-63
## LDHB 2.746137e-66   1.014725 0.953 0.614 3.766053e-62

# Find markers distinguishing cluster 5 from clusters 0 and 3
cluster5.markers <- FindMarkers(pbmc, ident.1 = 5, ident.2 = c(0, 3))
head(cluster5.markers, n = 5)

##                       p_val avg_log2FC pct.1 pct.2     p_val_adj
## FCGR3A        6.074132e-208   6.813028 0.970 0.039 8.330065e-204
## IFITM3        1.269986e-199   6.191496 0.976 0.048 1.741659e-195
## CFD           1.779978e-198   6.051379 0.939 0.037 2.441062e-194
## CD68          5.000885e-195   5.495804 0.927 0.035 6.858213e-191
## RP11-290F20.3 5.290916e-188   6.314962 0.829 0.016 7.255962e-184

# Find markers for ALL clusters
# only.pos = TRUE: only report upregulated genes
pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)

# View top markers for each cluster
pbmc.markers %>%
  group_by(cluster) %>%
  slice_max(n = 2, order_by = avg_log2FC)

## # A tibble: 18 × 7
## # Groups:   cluster [9]
##        p_val avg_log2FC pct.1 pct.2 p_val_adj cluster gene         
##        <dbl>      <dbl> <dbl> <dbl>     <dbl> <fct>   <chr>        
##  1 2.87e- 84       2.39 0.436 0.108 3.94e- 80 0       CCR7         
##  2 5.95e- 49       2.10 0.333 0.104 8.17e- 45 0       LEF1         
##  3 5.89e-140       7.29 0.301 0.004 8.08e-136 1       FOLR3        
##  4 3.73e-122       6.75 0.278 0.006 5.11e-118 1       S100A12      
##  5 4.69e- 59       2.10 0.423 0.111 6.44e- 55 2       AQP3         
##  6 2.69e- 35       1.90 0.266 0.069 3.68e- 31 2       CD40LG       
##  7 2.40e-272       7.38 0.564 0.009 3.29e-268 3       LINC00926    
##  8 2.75e-237       7.14 0.488 0.007 3.76e-233 3       VPREB3       
##  9 1.39e-156       4.21 0.585 0.059 1.91e-152 4       GZMK         
## 10 6.25e- 94       3.63 0.444 0.061 8.56e- 90 4       GZMH         
## 11 1.34e-165       5.86 0.366 0.005 1.84e-161 5       CKB          
## 12 4.15e-215       5.45 0.506 0.009 5.70e-211 5       CDKN1C       
## 13 8.19e-175       6.16 0.468 0.013 1.12e-170 6       AKR1C3       
## 14 8.58e-113       6.08 0.292 0.007 1.18e-108 6       SH2D1B       
## 15 1.46e-207       8.03 0.5   0.002 2.00e-203 7       SERPINF1     
## 16 1.48e-220       7.63 0.812 0.011 2.03e-216 7       FCER1A       
## 17 0              14.3  0.571 0     0         8       LY6G6F       
## 18 4.36e-206      13.8  0.357 0     5.98e-202 8       RP11-879F14.2

# Alternative test: ROC analysis
# Returns "classification power" (0=random, 1=perfect)
cluster0.markers <- FindMarkers(
  pbmc, 
  ident.1 = 0, 
  logfc.threshold = 0.25, 
  test.use = "roc", 
  only.pos = TRUE
)
head(cluster0.markers)

##       myAUC  avg_diff power avg_log2FC pct.1 pct.2
## RPS12 0.824 0.5084209 0.648  0.7423959 1.000 0.991
## RPS6  0.819 0.4677549 0.638  0.6811329 1.000 0.995
## RPS27 0.819 0.4996905 0.638  0.7299648 0.999 0.992
## RPL32 0.814 0.4238824 0.628  0.6184594 0.999 0.995
## RPS14 0.807 0.4295153 0.614  0.6280371 1.000 0.994
## RPS25 0.803 0.5246505 0.606  0.7765449 0.997 0.975

12. Visualization of Marker Genes

# Violin plots: expression distribution across clusters
VlnPlot(pbmc, features = c("MS4A1", "CD79A"))

# Can also plot raw counts (instead of normalized)
VlnPlot(pbmc, features = c("NKG7", "PF4"), slot = "counts", log = TRUE)

# Feature plots: overlay expression on UMAP
# Color intensity = expression level
FeaturePlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", 
                                "FCER1A", "FCGR3A", "LYZ", "PPBP", "CD8A"))

# Heatmap of top markers
# Extract top 10 markers per cluster
top10_markers <- pbmc.markers %>%
  group_by(cluster) %>%
  dplyr::filter(avg_log2FC > 1) %>%
  slice_head(n = 10) %>%
  ungroup()

DoHeatmap(pbmc, features = top10_markers$gene) + NoLegend()

# Dot plot: compact view of markers across clusters
# Dot size = % cells expressing, color = average expression
markers_to_plot <- c("IL7R", "CCR7", "CD14", "LYZ", "MS4A1", "CD8A", 
                     "FCGR3A", "MS4A7", "GNLY", "NKG7", "FCER1A", "CST3", "PPBP")
DotPlot(pbmc, features = markers_to_plot) + RotatedAxis()

13. Cell Type Annotation

# Assign cell type identities based on canonical markers
# Cluster 0: IL7R+, CCR7+ → Naive CD4+ T cells
# Cluster 1: CD14+, LYZ+ → CD14+ Monocytes
# Cluster 2: IL7R+, S100A4+ → Memory CD4+ T cells
# Cluster 3: MS4A1+ → B cells
# Cluster 4: CD8A+ → CD8+ T cells
# Cluster 5: FCGR3A+, MS4A7+ → FCGR3A+ Monocytes
# Cluster 6: GNLY+, NKG7+ → NK cells
# Cluster 7: FCER1A+, CST3+ → Dendritic cells
# Cluster 8: PPBP+ → Platelets

new.cluster.ids <- c("Naive CD4 T", "CD14+ Mono", "Memory CD4 T", "B", 
                     "CD8 T", "FCGR3A+ Mono", "NK", "DC", "Platelet")
names(new.cluster.ids) <- levels(pbmc)
pbmc <- RenameIdents(pbmc, new.cluster.ids)

# Plot annotated UMAP
DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()

14. Save Final Object

saveRDS(pbmc, file = "pbmc3k_final.rds")

15. ADDITIONAL ANALYSES (OPTIONAL)

# Ridge plots: show expression distribution as ridgeline plots
RidgePlot(pbmc, features = c("MS4A1", "CD79A", "CD3E"))

# Cell scatter: compare two genes at single-cell level
CellScatter(pbmc, cell1 = "AAACATACAACCAC-1", cell2 = "AAACATTGAGCTAC-1")

# Compare expression between specific groups
VlnPlot(pbmc, features = "CD3E", split.by = "seurat_clusters")

# Explore metadata
head(pbmc@meta.data)

##                  orig.ident nCount_RNA nFeature_RNA percent.mt RNA_snn_res.0.5
## AAACATACAACCAC-1     pbmc3k       2419          779  3.0177759               2
## AAACATTGAGCTAC-1     pbmc3k       4903         1352  3.7935958               3
## AAACATTGATCAGC-1     pbmc3k       3147         1129  0.8897363               2
## AAACCGTGCTTCCG-1     pbmc3k       2639          960  1.7430845               1
## AAACCGTGTATGCG-1     pbmc3k        980          521  1.2244898               6
## AAACGCACTGGTAC-1     pbmc3k       2163          781  1.6643551               2
##                  seurat_clusters
## AAACATACAACCAC-1               2
## AAACATTGAGCTAC-1               3
## AAACATTGATCAGC-1               2
## AAACCGTGCTTCCG-1               1
## AAACCGTGTATGCG-1               6
## AAACGCACTGGTAC-1               2

# Access specific data layers
pbmc[["RNA"]]$counts[1:10, 1:10]  # Raw counts

## 10 x 10 sparse Matrix of class "dgCMatrix"

##                                  
## AL627309.1    . . . . . . . . . .
## AP006222.2    . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . .
## LINC00115     . . . . . . . . . .
## NOC2L         . . . . . . . . . .
## KLHL17        . . . . . . . . . .
## PLEKHN1       . . . . . . . . . .
## RP11-54O7.17  . . . . . . . . . .
## HES4          . . . . . . . . . .

pbmc[["RNA"]]$data[1:10, 1:10]    # Normalized data

## 10 x 10 sparse Matrix of class "dgCMatrix"

##                                  
## AL627309.1    . . . . . . . . . .
## AP006222.2    . . . . . . . . . .
## RP11-206L10.2 . . . . . . . . . .
## RP11-206L10.9 . . . . . . . . . .
## LINC00115     . . . . . . . . . .
## NOC2L         . . . . . . . . . .
## KLHL17        . . . . . . . . . .
## PLEKHN1       . . . . . . . . . .
## RP11-54O7.17  . . . . . . . . . .
## HES4          . . . . . . . . . .

pbmc[["RNA"]]$scale.data[1:10, 1:10]  # Scaled data

##               AAACATACAACCAC-1 AAACATTGAGCTAC-1 AAACATTGATCAGC-1
## AL627309.1         -0.05812316      -0.05812316      -0.05812316
## AP006222.2         -0.03357571      -0.03357571      -0.03357571
## RP11-206L10.2      -0.04166819      -0.04166819      -0.04166819
## RP11-206L10.9      -0.03364562      -0.03364562      -0.03364562
## LINC00115          -0.08223981      -0.08223981      -0.08223981
## NOC2L              -0.31717081      -0.31717081      -0.31717081
## KLHL17             -0.05344722      -0.05344722      -0.05344722
## PLEKHN1            -0.05082183      -0.05082183      -0.05082183
## RP11-54O7.17       -0.03308805      -0.03308805      -0.03308805
## HES4               -0.23376818      -0.23376818      -0.23376818
##               AAACCGTGCTTCCG-1 AAACCGTGTATGCG-1 AAACGCACTGGTAC-1
## AL627309.1         -0.05812316      -0.05812316      -0.05812316
## AP006222.2         -0.03357571      -0.03357571      -0.03357571
## RP11-206L10.2      -0.04166819      -0.04166819      -0.04166819
## RP11-206L10.9      -0.03364562      -0.03364562      -0.03364562
## LINC00115          -0.08223981      -0.08223981      -0.08223981
## NOC2L              -0.31717081      -0.31717081      -0.31717081
## KLHL17             -0.05344722      -0.05344722      -0.05344722
## PLEKHN1            -0.05082183      -0.05082183      -0.05082183
## RP11-54O7.17       -0.03308805      -0.03308805      -0.03308805
## HES4               -0.23376818      -0.23376818      -0.23376818
##               AAACGCTGACCAGT-1 AAACGCTGGTTCTT-1 AAACGCTGTAGCCA-1
## AL627309.1         -0.05812316      -0.05812316      -0.05812316
## AP006222.2         -0.03357571      -0.03357571      -0.03357571
## RP11-206L10.2      -0.04166819      -0.04166819      -0.04166819
## RP11-206L10.9      -0.03364562      -0.03364562      -0.03364562
## LINC00115          -0.08223981      -0.08223981      -0.08223981
## NOC2L              -0.31717081      -0.31717081      -0.31717081
## KLHL17             -0.05344722      -0.05344722      -0.05344722
## PLEKHN1            -0.05082183      -0.05082183      -0.05082183
## RP11-54O7.17       -0.03308805      -0.03308805      -0.03308805
## HES4               -0.23376818      -0.23376818      -0.23376818
##               AAACGCTGTTTCTG-1
## AL627309.1         -0.05812316
## AP006222.2         -0.03357571
## RP11-206L10.2      -0.04166819
## RP11-206L10.9      -0.03364562
## LINC00115          -0.08223981
## NOC2L              -0.31717081
## KLHL17             -0.05344722
## PLEKHN1            -0.05082183
## RP11-54O7.17       -0.03308805
## HES4               -0.23376818

# Access dimensionality reductions
head(Embeddings(pbmc, reduction = "pca"))

##                        PC_1       PC_2       PC_3       PC_4        PC_5
## AAACATACAACCAC-1 -4.7292751 -0.5178973 -0.7785098 -2.3107648 -0.06914524
## AAACATTGAGCTAC-1 -0.5175606  4.5917945  5.9512550  6.8750637 -1.96480346
## AAACATTGATCAGC-1 -3.1887310 -3.4690403 -0.8464381 -1.9995317 -5.10404521
## AAACCGTGCTTCCG-1 12.7949159  0.1006277  0.6348048 -0.3668132  0.20824228
## AAACCGTGTATGCG-1 -3.1283532 -6.3474633  1.2630349  3.0187971  7.84386469
## AAACGCACTGGTAC-1 -3.1085159  0.9267873 -0.6638383 -2.3229775 -2.00034225
##                        PC_6       PC_7        PC_8        PC_9      PC_10
## AAACATACAACCAC-1  0.1201577  1.6578523 -1.10143404 -1.04257400 -1.8994335
## AAACATTGAGCTAC-1  2.8163862  1.4962317 -0.47988623  0.72518681 -0.6001089
## AAACATTGATCAGC-1  2.1543490  0.2814753  3.82721364  0.87309562 -1.1003500
## AAACCGTGCTTCCG-1 -2.8056616  0.8056122  0.02276886  0.64885096  3.4300943
## AAACCGTGTATGCG-1 -1.3272985 -2.3943618 -0.42574325  2.67000258  1.1848560
## AAACGCACTGGTAC-1  1.4781076  0.2747978 -0.39372154  0.06622178  1.5405334
##                       PC_11      PC_12      PC_13      PC_14      PC_15
## AAACATACAACCAC-1 -0.9478273  3.7134214 -1.0549289  0.6985891 -0.5557425
## AAACATTGAGCTAC-1  0.6459758  0.6866331  2.1856405 -2.9629433  0.1621167
## AAACATTGATCAGC-1  2.0412212  2.3826720 -0.5554507 -0.5423248 -1.0747630
## AAACCGTGCTTCCG-1  1.8851622  0.7825154 -0.7767382  2.1018061 -0.4360477
## AAACCGTGTATGCG-1  3.7965075 -0.4896353  0.8075237  0.8709608 -0.8097934
## AAACGCACTGGTAC-1  1.4572781 -1.3760260  0.7721660 -1.1407501 -0.4197158
##                       PC_16     PC_17      PC_18      PC_19      PC_20
## AAACATACAACCAC-1 -0.1816524  1.431729  1.2419834  0.3362684  3.0503279
## AAACATTGAGCTAC-1 -2.9176907 -1.166124  2.6123437  0.8810625  1.2452044
## AAACATTGATCAGC-1 -0.8491064 -1.316209 -0.4339659  1.9312247 -0.5679549
## AAACCGTGCTTCCG-1 -1.2278936  0.503283  2.5501990 -0.2466729 -0.1008676
## AAACCGTGTATGCG-1  2.0645166 -2.555129 -2.4765650  0.0738191  0.4230214
## AAACGCACTGGTAC-1  1.5293710 -3.045022 -1.4569620 -2.1249640  0.4996294
##                       PC_21      PC_22      PC_23      PC_24      PC_25
## AAACATACAACCAC-1  1.3001371  1.5219328 -0.6673179  1.2414452 -1.1152505
## AAACATTGAGCTAC-1  2.5402383 -3.3142224 -2.1234350  1.1791151 -0.1436921
## AAACATTGATCAGC-1 -0.7432764  3.7408251  1.6303017 -0.4468198 -2.3059559
## AAACCGTGCTTCCG-1  1.0068713 -0.3011553  0.8514839  2.3934413 -0.9711805
## AAACCGTGTATGCG-1 -1.1595904 -2.5004551 -0.9122929  0.6751512  1.8147134
## AAACGCACTGGTAC-1  1.9011451  1.2618518 -1.2805712  0.2515020  1.1657696
##                        PC_26       PC_27      PC_28     PC_29      PC_30
## AAACATACAACCAC-1  0.01159596  0.71473442  2.0069512  2.821098  1.8653275
## AAACATTGAGCTAC-1 -1.03295415 -1.95284714 -0.8230079  1.454720  2.1878012
## AAACATTGATCAGC-1 -0.56673160  2.66075153 -2.0663635  2.271431  0.9253747
## AAACCGTGCTTCCG-1  0.57775959  0.06344573 -1.3742174  1.571912  1.8186335
## AAACCGTGTATGCG-1  0.53415132 -0.56414805 -1.0489222 -0.128065  2.2425595
## AAACGCACTGGTAC-1  0.05035484 -0.58979768  0.2392321  1.580084 -1.1000889
##                      PC_31      PC_32       PC_33      PC_34      PC_35
## AAACATACAACCAC-1  2.103852  0.5921621 -0.56735661 -1.4232206 -0.7688225
## AAACATTGAGCTAC-1  0.301515  1.1175949  0.31653953 -1.2547154  0.5219057
## AAACATTGATCAGC-1  1.772025 -2.3879097  0.54758216  1.7556361  1.1260050
## AAACCGTGCTTCCG-1 -1.293335 -0.3114744 -0.54308638  1.0712670  1.9400979
## AAACCGTGTATGCG-1  2.844762 -1.0260633 -0.08202077 -1.8154050 -2.0572400
## AAACGCACTGGTAC-1  2.588619 -0.4734310 -1.21386055 -0.3154441 -2.8567061
##                       PC_36      PC_37      PC_38       PC_39      PC_40
## AAACATACAACCAC-1  1.4079622 -0.5083296 -2.4297983 -0.39602102  0.7723986
## AAACATTGAGCTAC-1  0.1720647  0.8167335  0.7821375 -0.18874507 -0.7427080
## AAACATTGATCAGC-1 -1.1937672  3.1163071  0.4971936 -0.07921955  5.1936317
## AAACCGTGCTTCCG-1  0.3598688  0.4349366  1.0864827  2.23648300  2.3113799
## AAACCGTGTATGCG-1 -0.7978706 -1.4029082  0.8102174  1.65298682  1.6451570
## AAACGCACTGGTAC-1  2.4241035 -0.6502820  1.0735433  2.53436904  3.8504129
##                       PC_41      PC_42       PC_43      PC_44      PC_45
## AAACATACAACCAC-1  1.9029185  0.1753952  1.20568883 -1.0774241 -1.5410618
## AAACATTGAGCTAC-1  0.9572113  0.1098679 -1.90014350 -2.1808424  1.0157868
## AAACATTGATCAGC-1  0.8772216 -0.5956472 -0.07963329 -0.6909843  1.1173646
## AAACCGTGCTTCCG-1  1.8464428 -0.9865839 -2.12441804 -0.2889806 -1.2889341
## AAACCGTGTATGCG-1 -0.9581393  1.5041188  0.99032977  0.2751317  0.3613616
## AAACGCACTGGTAC-1 -1.5826321  1.5078397 -1.04260415  0.9572032  1.8076640
##                      PC_46      PC_47      PC_48      PC_49     PC_50
## AAACATACAACCAC-1 -1.227286 -1.5806749  0.1723875  1.4244067 -1.913385
## AAACATTGAGCTAC-1 -0.366832 -0.3292392 -1.2941075 -0.8282082 -0.944586
## AAACATTGATCAGC-1 -2.783081 -1.1171637 -2.0106009 -1.6216072 -1.783388
## AAACCGTGCTTCCG-1  2.261530 -0.2425067  1.1551777  0.1554852 -1.456947
## AAACCGTGTATGCG-1  1.281437  0.6586142 -2.4837756  0.7770817 -1.476247
## AAACGCACTGGTAC-1 -2.206634 -1.8691839  1.6827016  4.2470014  2.088301

head(Embeddings(pbmc, reduction = "umap"))

##                     umap_1      umap_2
## AAACATACAACCAC-1 -3.465622   4.2166708
## AAACATTGAGCTAC-1 -5.062591 -10.1007833
## AAACATTGATCAGC-1 -3.723432   7.4668622
## AAACCGTGCTTCCG-1  8.140582  -5.2874198
## AAACCGTGTATGCG-1 -7.626617  -0.2415394
## AAACGCACTGGTAC-1 -2.079930   5.7106805

Key Parameters to Tune

QC thresholds (nFeature_RNA, percent.mt)
Number of PCs
Clustering resolution
Marker detection thresholds