Alignment

Materials from class on Wednesday, February 11, 2026

Code

References

  • Nagarajan, Niranjan, and Mihai Pop. “Sequence Assembly Demystified.” Nature Reviews. Genetics 14, no. 3 (March 2013): 157–67. doi:10.1038/nrg3367. - Gentle introduction into genome assembly. Technologies. Box2: Greedy, overlap-layout-consensus, De Bruijn. Problems

  • Pevzner, P. A., H. Tang, and M. S. Waterman. “An Eulerian Path Approach to DNA Fragment Assembly.” Proceedings of the National Academy of Sciences 98, no. 17 (August 14, 2001): 9748–53. https://doi.org/10.1073/pnas.171285098. - First de Bruijn graph for genome assembly paper. Idea of breaking reads into fragments. Typical approach reads are vertices connected by edges if they overlap. Hamiltonian path problem - visit each vertex exactly once, NP-complete. de Bruijn graph - overlapping fragments are edges, and the problem is Eulerian path - visit each edge once. Error-correction algorithm.

  • Primer: Compeau, Phillip E C, Pavel A Pevzner, and Glenn Tesler. “How to Apply de Bruijn Graphs to Genome Assembly.” Nature Biotechnology 29, no. 11 (November 8, 2011): 987–91. doi:10.1038/nbt.2023.

  • Chaisson, Mark J. P., Richard K. Wilson, and Evan E. Eichler. “Genetic Variation and the de Novo Assembly of Human Genomes.” Nature Reviews Genetics 16, no. 11 (October 7, 2015): 627–40. doi:10.1038/nrg3933. - Genome assembling strategies, problems. OLC, De Bruijn, string graphs. Types of gaps.

  • Miller, Jason R., Sergey Koren, and Granger Sutton. “Assembly Algorithms for Next-Generation Sequencing Data.” Genomics 95, no. 6 (June 2010): 315–27. doi:10.1016/j.ygeno.2010.03.001. - Assembly tools for overlap/layout/consensus and the de Bruijn graph approaches. de Bruin graph Issues with genome assembly, potential solutions.

  • String Graph Assembler. Simpson, J. T., and R. Durbin. “Efficient de Novo Assembly of Large Genomes Using Compressed Data Structures.” Genome Research 22, no. 3 (March 1, 2012): 549–56. doi:10.1101/gr.126953.111. - SGA - String Graph Assembler. From an FM-index. Velvet, ABySS, SOAPdenovo de Bruijn graph assemblers. BWA and FM explanation

  • Koren, Sergey, and Adam M. Phillippy. “One Chromosome, One Contig: Complete Microbial Genomes from Long-Read Sequencing and Assembly.” Current Opinion in Microbiology 23 (February 2015): 110–20. https://doi.org/10.1016/j.mib.2014.11.014. - Genome assembly overview focusing on long reads. Repeats (global and local) are problematic. Details on technologies: PacBio RS, Illumina’s Moleculo, ONT MinION. Assembling approaches: OLC, hierarchical hybrid (long reads correction using another technology) and non-hybrid (self long reads alignment-correction). Assembly augmentation: gap filling, scaffolding, read threading. Table 1 - long read assembly tools and descriptions.

Alignment algorithms and tools

  • Needleman, S. B., and C. D. Wunsch. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins.” Journal of Molecular Biology 48, no. 3 (March 1970): 443–53.

  • Smith, T. F., and M. S. Waterman. “Identification of Common Molecular Subsequences.” Journal of Molecular Biology 147, no. 1 (March 25, 1981): 195–97.

  • Burrows, Michael, and David J Wheeler. “A Block-Sorting Lossless Data Compression Algorithm,” 1994. - BWT paper

  • Ferragina, Paolo, and Giovanni Manzini. “Opportunistic Data Structures with Applications.” In Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium On, 390–98. IEEE, 2000. https://dl.acm.org/citation.cfm?id=796543. - FM index paper

  • Li, Heng, and Richard Durbin. “Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform.” Bioinformatics (Oxford, England) 25, no. 14 (July 15, 2009): 1754–60. https://doi.org/10.1093/bioinformatics/btp324. - BWA first paper

  • Li, Heng, and Richard Durbin. “Fast and Accurate Long-Read Alignment with Burrows-Wheeler Transform.” Bioinformatics (Oxford, England) 26, no. 5 (March 1, 2010): 589–95. https://doi.org/10.1093/bioinformatics/btp698. - BWA second paper

  • Li, Heng. “Aligning Sequence Reads, Clone Sequences and Assembly Contigs with BWA-MEM.” ArXiv Preprint ArXiv:1303.3997, 2013. - BWA-MEM (maximal exact matches). Automatically select end-to-end or gapped alignment.

  • Langmead, Ben, and Steven L Salzberg. “Fast Gapped-Read Alignment with Bowtie 2.” Nature Methods 9, no. 4 (March 4, 2012): 357–59. https://doi.org/10.1038/nmeth.1923. - Bowtie2 gapped alignment

  • Dobin, Alexander, Carrie A. Davis, Felix Schlesinger, Jorg Drenkow, Chris Zaleski, Sonali Jha, Philippe Batut, Mark Chaisson, and Thomas R. Gingeras. “STAR: Ultrafast Universal RNA-Seq Aligner.” Bioinformatics (Oxford, England) 29, no. 1 (January 1, 2013): 15–21. https://doi.org/10.1093/bioinformatics/bts635. - STAR gapped aligner. Algorithm, testing on simulated dataset.

  • Kim, Daehwan, Ben Langmead, and Steven L. Salzberg. “HISAT: A Fast Spliced Aligner with Low Memory Requirements.” Nature Methods 12, no. 4 (April 2015): 357–60. https://doi.org/10.1038/nmeth.3317. - HISAT aligner. Two indexes - one large with many small

  • The Freiburg RNA tools teaching material on RNA- and DNA-sequencing alignment (Needleman-Wunsch, Smith-Waterman, many more). http://rna.informatik.uni-freiburg.de/Teaching/

  • Fonseca, Nuno A., Johan Rung, Alvis Brazma, and John C. Marioni. “Tools for Mapping High-Throughput Sequencing Data.” Bioinformatics (Oxford, England) 28, no. 24 (December 15, 2012): 3169–77. https://doi.org/10.1093/bioinformatics/bts605. - Sequencing mappers, lists and characteristics of DNA, RNA, bisulfite aligners. https://www.ebi.ac.uk/~nf/hts_mappers/

Videos