W2: Intelligent Methods Optimization of Algorithms for NGS
Tuesday, April 5, 2016 | 8:00 – 11:30 am
The results of genetic, genomic and proteomic studies are strongly influenced by the approach to data representation and data analysis; which method one chooses at each step matters.
8:00 am Welcome and Introductions
8:15 Building Algorithms for Emerging Technologies
Michele Busby, Ph.D., Computational Biologist, Broad Technology Labs, Broad Institute
When breakthrough molecular biology techniques are introduced, the computational tools required to analyze these data can lag behind. Initial implementations are often optimized for demonstrating the utility of the technique, rather than robustly accounting for the inherent limits in the data. As an internal innovation hub, the Broad Technology Labs has developed and provides a wide array of cutting-edge sequencing capabilities including ChIP-Seq, genome assembly, oligonucleotide synthesis, and single-cell RNA sequencing. To ensure the high quality of our products, we have developed a systematic approach for identifying biases though systematic experimentation. This talk describes the techniques we use with an emphasis on the importance of leveraging strong connections between wet and dry labs during algorithm development.
8:55 Cloud-Based Platform for Algorithm Optimization of Next-Generation Sequencing (NGS)
Lu Zhang, Ph.D., Principal Bioinformatics Scientist, Seven Bridges Genomics
Data mining algorithms are at the heart of genomic research. How to effectively develop and use these algorithms to unlock important biological insights, however, is a growing challenge for the NGS field. This is because of the large data volumes, expansive solution spaces, and multiple optimization objectives that are paramount to NGS analyses. This presentation begins with discussion surrounding the necessary considerations in developing an NGS workflow optimization followed by an introduction to Seven Bridges Genomics’s (SBG) cloud-based platform. SBG’s platform will then be used on an example genome analysis to demonstrate how developers can more efficiently scale their benchmarking experiments. Attendees will then receive firsthand experience with SBG’s cloud platform and explore ways to: 1) enable algorithms to be aware of input characteristics; 2) make dynamic changes based on resource allocation constraints; and 3) understand sources of variability by capturing experimental metadata and performance.
9:35 Coffee Break
9:50 Practical Approaches to Analyze 1000s of Genomes on AWS Cloud
Dinanath Sulakhe, Engagement Manager and Solutions Architect, Computation Institute, University of Chicago and Argonne National Lab
In this talk, the Globus Genomics team presents practical methodologies that enable reliable, large-scale execution analysis on AWS cloud. We present our profiling service that generates computational profiles for all the analysis tools and a provisioning service that creates and manages execution plans based on the computational profiles from the service.
10:30 Inferring Genic Intolerance to Mutation Using Large-Scale Human Variation Data
Kaitlin Samocha, Research Scientist, Mark Daly Laboratory, Analytic and Translational Genetics Unit, Broad Institute
A primary challenge of human genetics is to distinguish disease-causing rare variation from the multitude of more benign, low-frequency variants found in any genome. As a complement to methods that predict the deleteriousness of individual variants, we empirically identified genes that were significantly depleted of the expected amount of loss-of-function and/or missense variation in the 60,706 reference individuals included as part of the Exome Aggregation Consortium. These constrained genes are enriched for established human disease genes, particularly those for which disease alleles work in a dominant or haploinsufficient manner.
11:10 Interactive Q&A with Instructors and Participants
11:30 Close of Workshop
Instructors
Michele Busby, Ph.D., Computational Biologist, Broad Technology Labs, Broad Institute
Michele Busby is a computational biologist at Broad Technology Labs, the Broad Institute’s internal innovation hub for developing breakthrough technologies and providing them as products to the greater scientific community. Her work focuses on computational methods for assessing and ensuring the quality of the data produced at the BTL. She received her Ph.D. in Biology/Bioinformatics from Boston College where she developed Scotty, a web-based application that uses a statistical model to assist users to design adequately powered RNA-Seq experiments.
Kaitlin Samocha, Research Scientist, Mark Daly Laboratory, Analytic and Translational Genetics Unit, Broad Institute
Kaitlin Samocha is a Ph.D. candidate in the Biological and Biomedical Sciences (BBS) graduate program at Harvard University. Under the mentorship of Dr. Mark Daly, she developed a model of the rate of spontaneous mutation in the human exome (the protein-coding region of the genome). The model became the basis of a statistical framework to rigorously evaluate the burden of de novo variants, and has been applied in studies of disorders ranging from congenital heart disease to autism spectrum disorders. More recently, the model has been used to predict the expected number of rare variants in large reference populations. These predictions were then used to empirically define a set of genes that are under significant selective constraint, and are therefore more likely to contribute to disease.
Dinanath Sulakhe, Engagement Manager and Solutions Architect, Computation Institute, University of Chicago and Argonne National Lab
Dinanath is an engagement manager and solutions architect at the Computation Institute. He received his Masters in Computer Science from Illinois Institute of Technology in 2003 and has an extensive background and experience in Biomedical informatics and Systems Biology for over a decade working at Argonne National Laboratory and the University of Chicago.
Lu Zhang, Ph.D., Principal Bioinformatics Scientist, Seven Bridges Genomics
Lu Zhang is a Principal Bioinformatics Scientist at Seven Bridges, which helps large organizations conduct population-scale biomedical data analysis via its software platform. Since Lu joined Seven Bridges in June 2012, she has worked closely with interdisciplinary teams, and led several research projects. In her current role, Lu is responsible for identifying, examining, and deploying promising scalable analytical technologies to tackle the challenges in the field of precision medicine, with a focus on cancer genomics. Her special areas of expertise are modeling, algorithms, genomics, lipidomics, and molecular dynamics. Lu is the author or co-author of six scientific research papers and one book chapter. Lu holds a BS in Bioinformatics from Zhejiang University of China and a Ph.D. in Computational Biology from Boston College. While at Boston College, she served as the Vice President of Boston College Chinese Students and Scholars Association (BC-CSSA) 2011. Lu was also a co-organizer of MIT-China Innovation and Entrepreneurship Forum (MIT-CHIEF) 2011, and a mentor for Research Science Institute (RSI) scholars 2010.