Next-generation sequencing datasets are continuing to increase in size, and even small genomic projects now generate terabytes of data. The sheer scale of these datasets now poses a great computational challenge: how can we improve software pipelines to analyse data more efficiently?
Designed jointly by Intel and the Francis Crick Institute, this workshop will tackle some of these challenges and train participants in the principles and practicalities of optimizing NGS analysis pipelines.
Nicholas Luscombe (UCL and and Cancer Research UK London Research Institute)
Robert Maskell (Intel)
Gabriella Rustici (EMBL-European Bioinformatics Institute)
Participation fee (refundable deposit): £100
The registration fee will be invoiced to your sponsor in advance of the course and is payable prior to course start date. It is our intention to refund the registration fee after the course has completed to those who attend and complete the course.
Number of participants: 30
Participants will be selected based on applicants’ experience and relevance of their current work to the objectives of the course.
Application deadline: 15 July 2013 (Preference will be given to applicants from Francis Crick Institute partners that apply before 1 July 2013).
Please apply through the registration page.
Is this course right for me?
The aim of this course is to familiarize participants with high-performance computing (HPC) methodologies and to provide hands-on training on how to optimize a next-generation sequencing (NGS) analysis pipeline. The workshop is aimed at bioinformaticians who are actively involved in NGS data analysis projects and want to learn how to use HPC solutions to run their analytical pipelines in an efficient and reproducible manner. DNA and RNA sequencing analysis workflows will be used to explore bottlenecks and demonstrate solutions.
What will I learn?
Lectures will outline the computational challenges and bottlenecks associated with the analysis of NGS data and present HPC optimization approaches to overcome such challenges. Practicals will consist of computer exercises that will enable the participants to compare optimized vs. non-optimized software code for the analysis of NGS data, under the guidance of the lecturers and teaching assistants.
Prerequisites: A high degree of familiarity with the LINUX/UNIX operating system and knowledge of the R programming language. Applicants will also need to demonstrate their current involvement in high-throughput sequencing data analysis projects.
What will it cover?
Topics will include:
- How to optimize NGS analysis workflows through HPC best practices
- Optimal use of software tools for short read alignment, with emphasis on Bowtie2 and Tophat2
- HPC concepts including parallelization, single/multi-process, shared/distributed memory, CPU memory and I/O constraints, etc.
- Diagnostic tools for debugging and monitoring of parallel programs
- Benchmarking of various technology and system architecture approaches
- Cloud-based analytics
- Scaling up a workflow to deal with a production scale environment and increasingly large datasets
Kristina Kermanshahche (Intel)
Clay Beshears (Intel)
Ketan Paranjape (Intel)
Vincent Plagnol (UCL)
Robert Sugar (Cancer Research UK London Research Institute)
Ernest Turro (Department of Haematology, University of Cambridge (tbc))
Kathi Zarnack (Cancer Research UK London Research Institute)