Track 1 - April 5 – 7, 2016
Data & Storage Management
Infrastructure and Storage Capabilities and Solutions in the R&D Ecosystem
Storage is becoming a major cost element in the genomic IT world and as a result, several new methods/technologies are emerging. Track 1 assembles thought leaders who will present concrete case studies and best practices solutions of big data storage. Themes covered include but aren’t limited to backup and migration strategies; object storage; on-premise storage transition to cloud-based; compression of genome-based files; storing files in terms of tier 1 and 3; regulatory issues involving user governance, IP, tracking storage access, encryption, and compliance; standards; business models for storing and sharing research data long term; architectures to match advances in sequencing with data storage requirements; and collaboration and multi-site implications on data sets. There will be a joint session with this track and one of the other tracks to discuss common issues including converged infrastructure and networking.
Tuesday, April 5
7:00 am Workshop Registration and
Morning Coffee
8:00 – 11:30 Recommended Morning Pre-Conference Workshops*
Creating a Best of Breed Informatics Environment for Your Organization
12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*
DNA for Data Storage
* Separate registration required
2:00 – 6:00 Main Conference
Registration
4:00 PLENARY KEYNOTE SESSION
/p>
5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster
Viewing
Wednesday, April 6
7:00 am Registration Open and
Morning Coffee
8:00 PLENARY KEYNOTE SESSION
9:00 Benjamin Franklin Awards and Laureate Presentation
9:30 Best Practices Awards Program
9:45 Coffee Break in the Exhibit Hall
with Poster Viewing
TECHNOLOGIES, TOOLS, AND PLATFORMS TO BRING
ADVANCES IN SCIENCE FROM BENCH TO BEDSIDE
10:50 Chairperson’s Opening Remarks
Anil Srivastava, President, Open Health Systems Laboratory
11:00 Panel
Discussion: IUCKA: Indo-US Cancer Knowledge Alliance
Moderator: Anil Srivastava, President, Open Health Systems
Laboratory
Kenneth Buetow, Ph.D., Director, Computational Sciences and Informatics,
Complex Adaptive Systems Initiative (CASI), Arizona State University
Rajendra Joshi, Ph.D., Associate Director and Head,
Bioinformatics Group, Centre for Development of Advanced Computing, Pune
University Campus
IUCKA: Indo-US Cancer Knowledge Alliance is being designed as
an integrated biomedical informatics cyberinfrastructure for cancer treatment
and research in India. It will be a true translational research platform from
bench to bedside connecting cancer treatment and research centers across the
country with access and connection to global centers of research, especially in
the United States. The promoters of the IUCKA are Arizona State University,
Open Health Systems Laboratory and Varian Medical Systems. IUCKA is being
implemented as a PPP (public-private partnership) and is bringing together
technology products, service providers and cancer treatment and research
centers in an ecosystem to directly benefit cancer patients in India and
contribute to global research collaboration, especially between cancer centers
in India.
12:00 pm Managing Data Across the Research
Life-Cycle for Life Sciences
George Vacek, Global Director, Life Sciences, DDN
Dr. Vacek will deliver several in-depth case studies of
leading life sciences organizations leveraging high performance & high
scale data solutions for genomics, imaging & simulation workflows. Cases
will focus on implemented solutions: capturing & effectively exploiting
large scale data at speed, regulated & non-regulated stewardship
considerations, transitioning from non-scaling architectures & bringing the
benefits of high-end HPC technologies & techniques into smaller deployments
& collaborative scenarios.
12:15 Data Management in Large Scale Sequencing and Analysis
Kirill Malkin, Director, Storage Engineering, SGI
Next Generation Sequencing and its accompanying analyses are driving exponential growth in sequence data that needs to be stored, analyzed, and made accessible for future interrogations. This session presents a converged storage-and-analytics infrastructure framework based on SGI’s experience in enabling data-intensive supercomputing solutions – along with genomics customer case examples and best practices for simplifying the management of data sets that can contain billions of files/objects.
12:30 Session Break
12:40 Luncheon Presentation I: Accelerating the Analysis of High-Throughput Sequencing
Ketan Paranjape, General Manager, Life Sciences, Health and
Life Sciences, Intel
Panelists: Paolo Narvaez, Ph.D., Principal Engineer & Director, Personalized Care Platform, Intel Corporation
Adam Kiezun, Ph.D., Senior Group Leader, Computational Methods Development, Broad Institute of MIT and Harvard
Jeff Gentry, Principal Software Engineer, Broad Institute
Accelerating the analysis of high-throughput sequencing data enables all of us to push the boundaries of precision medicine. The BROAD’s Genome Analysis Toolkit (GATK) is the industry standard software package for variant discovery and genotyping. In this luncheon, experts from the BROAD and Intel will discuss the exciting new capabilities that are coming to GATK, and the impact that this could have on the industry.
1:10 Luncheon Presentation II: Cloud Bursting HPC Workloads: Challenges and Opportunities
Dan Chow, COO/CTO, Silicon Mechanics
Feeling constrained by your HPC cluster? Are there times that you need more capacity or to offload some storage? Bursting to the public cloud offers you an alternative to grow with added flexibility. Dan will share about the benefits our customers have experienced and cover some of the pitfalls to be wary of when evaluating how to implement cloud bursting.
1:40 Session Break
1:50 Chairperson’s Remarks
Sanjay Joshi, CTO, EMC
1:55 High-Performance Computing Clusters and Storage Enabling
Big Data Genomic Analyses Outcomes across Research and Clinical Domains:
Implementations, Operations, and Lessons Learned
Jason Hughes, MBA, MS, Director, Enterprise Research
Applications & High-Performance Computing, Penn Medicine Academic Computing
Services, Perelman School of Medicine at the University of Pennsylvania
Storage and analysis of large datasets is a growing need for academic researchers, as is the analysis of genomic data in the pursuit of personalized medicine. In 2012, The Perelman School of Medicine at the University of Pennsylvania made a capital investment in a centrally supported HPC environment. Housing 2PB of disk storage, 1.8PB of archive storage, and over 4,500 computing cores, this HPC is available to faculty, staff, students, and clinicians. This presentation will review the three-year history of the HPC environment, the technical, administrative, and financial constructs within which these services are provided, lessons learned in the areas of data storage and management, and how HPC storage and compute capabilities are enabling the tri-part organizational mission of education, research, and clinical care.
2:25 Low Cost Data Management System for Large Scale Ion
Sequencing Systems
Mohamed Abouelhoda, Ph.D., Head, Bioinformatics, Saudi Human
Genome Project, Genetics Department, King Faisal Specialist Hospital and
Research Center
Storage infrastructure is a major problem when running NGS
based sequencing projects. Reducing storage requirements means cost and effort
reduction. Our presentation will provide a solution to an urgent demand for
genome sequencing projects, and it will be appealing for both project owners
and IT specialists. We will also discuss best practices achieved in handling thousands
of sequencing runs in the Saudi Genome Project.
2:55 How Web-Scale Storage is Enabling Faster, Efficient Medical Research Collaboration for More Effective Patient Treatments
Piers Nash, Ph.D.,
Director, Business Development and Outreach, University of Chicago
Learn how the University of Chicago’s Center for Data
Intensive Science (CDIS) accelerates medical discoveries by democratizing
access to data for scientific research. Utilizing an object storage solution,
CDIS centrally stores and manages vast amounts of genomic and clinical data at
web-scale, allowing researchers to collaborate via shared access to harmonized
data sets, speeding discovery and enabling precision medicine.
3:10 Life Science - Fast & Slow
Patrick Combes, Principal Solution Architect, Life Science
& HPC, EMC
The handling of data from Life Science workflows can be characterized as fast and slow. While no single storage or computing technology can address the entire continuum of fast-n-slow requirements, EMC has introduced several new products and made significant enhancements to existing lines in the past few years to cover as much as possible. We will highlight these new developments for HPC, converged infrastructure and research data management and illustrate how they can be applied to Life Science applications and workflows.
3:25 Refreshment Break in the Exhibit Hall with Poster Viewing
4:00 Panel
Discussion: The IUPAC InChI Standard for Large Molecules
Steve Heller, Ph.D., Project Director, InChI Trust; Scientific
Information Consultant (Chairperson/Moderator)
Evan Bolton, Ph.D., Lead Scientist, National Center for
Biotechnology Information (NCBI), National Library of Medicine (NLM), and
National Institutes of Health (NIH)
Keith T. Taylor, BSc, Ph.D., MRSC, Principal, Ladera
Consultancy
Tyler Peryea, Informatics
Scientist, National Center for Advancing Translational Sciences (NCATS)
Lawrence Callahan, Ph.D., Chemist, Substance Registration
System, Office of Critical Path Programs, Food and Drug Administration (FDA)
This session will present
the ongoing work of IUPAC and the US Government agencies involved in the
development of a standard method for describing large molecules, often called
biologics, to allow for easier linking of diverse sources of data and
information about these molecules. The term biologics in regard to this work
means chemically modified amino acid sequences, nucleic acid sequences,
carbohydrates, lipids or any combination of these. These large molecules are a
rapidly growing portion of the chemical substance descriptions and bioactivity
data bring made available online by many diverse and valuable resources.
5:00 Scale, Speed, Smart — IBM
Genomics Reference
Architecture
Frank Lee, Ph.D., Global
Healthcare Life Sciences Industry Leader, IBM
Dr. Frank Lee will share
the IBM Genomics Reference Architecture as an open and innovative platform to
address the explosive growth of data from genomics research, drug discovery and
clinical application. With seamless integration of workload orchestration and
data management, the architecture is designed to handle large amounts of data
and job throughput, yet flexible enough to be deployed on-premise, in the cloud
or as a hybrid. A demo will be shown for a genomics pipeline in operation from
a hybrid cloud.
5:30 – 6:30 Best of Show Awards
Reception in the Exhibit Hall
with Poster Viewing
Thursday, April 7
7:00 am Breakfast Presentation: Enabling Technology. Leveraging Data. Transforming Precision Medicine.
Panelists: Sanjay Joshi, CTO, Life Sciences, EMC
Angel Pizarro, Director, Scientific Computing, Amazon Web Services
Ari Berman, Ph.D., General Manager, Government Services, Principal Investigator, BioTeam, Inc.
Eric A. Stahlberg, Ph.D., Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research (FNLCR)
Through collaborations, research and innovation, Intel is supporting the advancement of processing, storage, networking, data security, sequencing efficiency, accelerated bioinformatics and advanced analytics—to push the boundaries of this new “precision medicine” and bring us closer than ever to truly making care personal. Listen to this panel discuss how technology and bioscience are coming together to accelerate precision medicine through on-premise and cloud based solutions.
8:00 PLENARY KEYNOTE SESSION PANEL
10:00 Coffee Break in the Exhibit
Hall and Poster Competition
Winners Announced
10:30 Chairperson’s Opening Remarks
Sanjay Joshi, CTO, EMC
10:40 FEATURED PRESENTATION: HPC Trends in the
Trenches 2016
Chris Dagdigian, Founding Partner & Director, Technology,
BioTeam, Inc.
In one of the most
popular presentations of the Expo, Chris delivers a candid assessment of the
best, the worthwhile, and the most overhyped information technologies (IT) for
life sciences. He’ll cover what has changed (or not) in the past year around
infrastructure, storage, computing, and networks. This presentation will help
you understand IT to build and support data intensive science.
11:40 Realize a Fiftyfold Increase in Sequencing by Combining Performance Scale-Out Storage with the Latest Next-Gen Sequencers
David Sallak, Vice President, Products & Solutions, Panasas
In this talk, you will learn how to easily harness and manage data by deploying scale-out storage that accelerates workflows and brings plug-and-play simplicity to data management. Panasas customer Garvan Institute of Medical Research accomplished a 50X increase in their sequencing capabilities after combining the Illumina HiSeq X Ten sequencer with Panasas ActiveStor performance scale-out NAS.
11:55 How Next Generation Scale-Out Storage Is
Enabling the Next Frontier of Life Sciences Breakthroughs
Joel Groen, Product Manager, Qumulo
With major technology advances in genomic IT, data is being
created at a faster rate than ever before – creating massive storage and data
management challenges for Life Sciences and bioinformatics organizations that
are tasked with managing hundreds of millions to trillions of files. Enter
next-generation scale-out storage – which provides real-time answers about data
footprints at incredible scale, abstracts away the underlying infrastructure,
and achieves breakthrough performance using intelligent software and commodity
hardware – all while balancing performance, capacity and cost.
12:10 pm Session Break
12:20 Luncheon Presentation I: Object Storage:
Enabling Genomic Sequencing at Petabyte Scale
Joe Arnold, President and Chief Product Officer, Leadership,
SwiftStack
The audience will learn the following from our presentation:
1) How incorporating multi-petabyte storage-as-a-service into research
environments can be cost-efficient, scalable and manageable; 2) How to
implement an open source object storage system that keeps up with data volume
while improving data management and organization by using arbitrary tags and
metadata; and 3) How chargebacks can determine storage user behavior.
12:50 Luncheon Presentation II:
Searching through
Petabytes of Data to Find What You Actually Want
Kiran Bhageshpur, CEO, Igneous Systems
Just a decade ago, large
data sets were still measured in the TB’s and PB data sets were rare. In
today’s world, even a modest laboratory can generate petabytes of data that
needs to be ingested, processed, curated and stored for decades. Yet, our ways
of interacting with these large data sets remain mired in tools and techniques
build for MB data sets. Surely there has to be a better way?
1:20 Dessert Refreshment Break in the Exhibit Hall with Poster
Viewing
1:55 Chairperson’s Remarks
Eric A. Stahlberg, Ph.D., [Contractor], High-Performance
Computing Strategy, Data Science and Information Technology Program, Leidos
Biomedical Research, Inc., Frederick National Laboratory for Cancer Research
(FNLCR)
2:00 Panel
Discussion: Actionable Big Data Analytics
Moderator: Eric A. Stahlberg, Ph.D., Leidos Biomedical
Research, Inc., Frederick National Laboratory for Cancer Research (FNLCR)
Kiran Bhageshpur, CEO, Igneous Systems, Inc.
David King, CEO, Exaptive
Timothy Danford, Ph.D., Field Engineer, Tamr, Inc.
Leading life sciences experts will discuss trends and best
practice case studies of turning big data into smart data that can lead to real
time assistance in organization decision making, disease prevention, prognosis,
diagnostics, and therapeutics. Learn how and where these organizations have
assembled and analyzed information from different data ‘silos’ and deployed
solutions to make decisions. We’ll discuss technology tools used to move data
and information from retrospective reporting to real-time predictive analytics.
Walk away hearing practical steps, solutions, and capabilities that you can
implement within your own organization.
3:30 Future Convergence
Eric A. Stahlberg, Ph.D., Leidos Biomedical Research, Inc.,
Frederick National Laboratory for Cancer Research (FNLCR)
4:00 Conference Adjourns