class: center, middle, inverse, title-slide .title[ # GenomicRanges ] .author[ ### Mikhail Dozmorov ] .institute[ ### Virginia Commonwealth University ] .date[ ### 2026-03-16 ] --- <!-- HTML style block --> <style> .large { font-size: 130%; } .small { font-size: 70%; } .tiny { font-size: 40%; } </style> ## Ranges overview <img src="img/RangeOperations.png" alt="" width="800px" style="display: block; margin: auto;" /> --- ## Ranges in Bioconductor: IRanges and GRanges Bioconductor utilizes a hierarchical range infrastructure to manage genomic intervals efficiently, moving from simple integer math to complex genomic coordinates. `IRanges` (Integer Ranges) The fundamental building block for interval algebra in R. It focuses strictly on the sequence of integers without biological context. * **Core Components:** Defined by any two of `start`, `end`, and `width`. * **Properties:** * **List-like Behavior:** Supports standard R operations like `length()`, subsetting `[]`, and `c()`. * **Metadata (`mcols`):** Allows you to attach any amount of data (e.g., p-values, scores) to each specific range. * **Primary Use:** Non-genomic intervals or internal calculations where strand and chromosome are irrelevant. --- ## Ranges in Bioconductor: IRanges and GRanges Bioconductor utilizes a hierarchical range infrastructure to manage genomic intervals efficiently, moving from simple integer math to complex genomic coordinates. `GRanges` (Genomic Ranges) Extends `IRanges` by placing intervals onto a biological map. This is the "workhorse" class for most Bioconductor analyses. * **The "Genomic" Context:** * `seqnames`: The chromosome or scaffold name (e.g., "chr1"). * `start`, `end`: Genomic coordinates * `strand`: Defines the directionality (`+`, `-`, or `*` for unstranded). .small[ **Key Methods:** `start(gr)`, `end(gr)`, `width(gr)`, `seqnames(gr)`, `strand(gr)`, and `mcols(gr)` ] --- ## Ranges in Bioconductor: IRanges and GRanges <img src="img/GRanges.png" alt="" width="900px" style="display: block; margin: auto;" /> --- ## Ranges in Bioconductor: IRanges and GRanges **`Seqinfo`:** A critical metadata component that tracks: * **Genome Build:** (e.g., "hg38"). * **Sequence Lengths:** Ensures ranges do not accidentally exceed chromosome boundaries. * **Circular Flags:** Identifies circular DNA (like mitochondria). --- ## Range methods * **Intra-range**: `shift()`, `narrow()`, `flank()`, `promoters()`, `resize()` * **Inter-range**: `range()`, `reduce()`, `gaps()`, `disjoin()`, `coverage()` * **Between-range**: `findOverlaps()`, `countOverlaps()`, `summarizeOverlaps()`, `%over%`, `union()`, `intersect()` --- ## Lists of Genomic Ranges * Useful for elements of the same type (e.g., exons within transcripts). * Common trick: apply vectorized functions to unlisted representation, then re-list. <img src="img/GRangesList.png" alt="" width="800px" style="display: block; margin: auto;" />