The genome assembly problem changed dramatically with the advent of next generation technologies, and has continued to evolve along with the so-called “third-generation:” PacBio’s single molecule, real-time sequencing. Illumina and PacBio have become the dominant technologies with which to sequence and assemble novel genomes, due to their complementary strengths and weaknesses. In this bootcamp, we will cover the principles of older overlap-layout-consensus (OLC) assemblers and the newer de Bruijn graph and string graph assemblers, and play with examples of each using the Galaxy platform. In addition, we will cover techniques that take advantage of long PacBio reads, including scaffolding, hybrid or self-correction pipelines, and hybrid Illumina/PacBio or purely PacBio assembly. Finally, we will discuss the changing field of genome assembly on a broader scale, including the recent Assemblathon competitions and remaining challenges in assessing the accuracy of genome assembly tools and pipelines.
Students should have some familiarity with both the Galaxy platform and next generation sequencing technologies; attendance of our “Introduction to Next Generation Sequence Analysis with Galaxy” or equivalent experience is recommended.