Track 1 - Data & Storage Management 2016 Bio-It World Expo

2016 Archived Content

OVERVIEW | DOWNLOAD BROCHURE | WORKSHOPS

Track 1 - April 5 – 7, 2016

Data & Storage Management

Infrastructure and Storage Capabilities and Solutions in the R&D Ecosystem

Storage is becoming a major cost element in the genomic IT world and as a result, several new methods/technologies are emerging. Track 1 assembles thought leaders who will present concrete case studies and best practices solutions of big data storage. Themes covered include but aren’t limited to backup and migration strategies; object storage; on-premise storage transition to cloud-based; compression of genome-based files; storing files in terms of tier 1 and 3; regulatory issues involving user governance, IP, tracking storage access, encryption, and compliance; standards; business models for storing and sharing research data long term; architectures to match advances in sequencing with data storage requirements; and collaboration and multi-site implications on data sets. There will be a joint session with this track and one of the other tracks to discuss common issues including converged infrastructure and networking.

Tuesday, April 5

7:00 am Workshop Registration and Morning Coffee

8:00 – 11:30 Recommended Morning Pre-Conference Workshops* Creating a Best of Breed Informatics Environment for Your Organization

12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops* DNA for Data Storage

* Separate registration required

2:00 – 6:00 Main Conference Registration

4:00 PLENARY KEYNOTE SESSION

/p>

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing

Wednesday, April 6

7:00 am Registration Open and Morning Coffee

8:00 PLENARY KEYNOTE SESSION

9:00 Benjamin Franklin Awards and Laureate Presentation

9:30 Best Practices Awards Program

9:45 Coffee Break in the Exhibit Hall with Poster Viewing

TECHNOLOGIES, TOOLS, AND PLATFORMS TO BRING ADVANCES IN SCIENCE FROM BENCH TO BEDSIDE

10:50 Chairperson’s Opening Remarks

Anil Srivastava, President, Open Health Systems Laboratory

11:00 Panel Discussion: IUCKA: Indo-US Cancer Knowledge Alliance

Moderator: Anil Srivastava, President, Open Health Systems Laboratory

Kenneth Buetow, Ph.D., Director, Computational Sciences and Informatics, Complex Adaptive Systems Initiative (CASI), Arizona State University

Rajendra Joshi, Ph.D., Associate Director and Head, Bioinformatics Group, Centre for Development of Advanced Computing, Pune University Campus

IUCKA: Indo-US Cancer Knowledge Alliance is being designed as an integrated biomedical informatics cyberinfrastructure for cancer treatment and research in India. It will be a true translational research platform from bench to bedside connecting cancer treatment and research centers across the country with access and connection to global centers of research, especially in the United States. The promoters of the IUCKA are Arizona State University, Open Health Systems Laboratory and Varian Medical Systems. IUCKA is being implemented as a PPP (public-private partnership) and is bringing together technology products, service providers and cancer treatment and research centers in an ecosystem to directly benefit cancer patients in India and contribute to global research collaboration, especially between cancer centers in India.

12:00 pm Managing Data Across the Research Life-Cycle for Life Sciences

George Vacek, Global Director, Life Sciences, DDN

Dr. Vacek will deliver several in-depth case studies of leading life sciences organizations leveraging high performance & high scale data solutions for genomics, imaging & simulation workflows. Cases will focus on implemented solutions: capturing & effectively exploiting large scale data at speed, regulated & non-regulated stewardship considerations, transitioning from non-scaling architectures & bringing the benefits of high-end HPC technologies & techniques into smaller deployments & collaborative scenarios.

12:15 Data Management in Large Scale Sequencing and Analysis

Kirill Malkin, Director, Storage Engineering, SGI

Next Generation Sequencing and its accompanying analyses are driving exponential growth in sequence data that needs to be stored, analyzed, and made accessible for future interrogations. This session presents a converged storage-and-analytics infrastructure framework based on SGI’s experience in enabling data-intensive supercomputing solutions – along with genomics customer case examples and best practices for simplifying the management of data sets that can contain billions of files/objects.

12:30 Session Break

12:40 Luncheon Presentation I: Accelerating the Analysis of High-Throughput Sequencing

Ketan Paranjape, General Manager, Life Sciences, Health and Life Sciences, Intel

Panelists: Paolo Narvaez, Ph.D., Principal Engineer & Director, Personalized Care Platform, Intel Corporation

Adam Kiezun, Ph.D., Senior Group Leader, Computational Methods Development, Broad Institute of MIT and Harvard

Jeff Gentry, Principal Software Engineer, Broad Institute

Accelerating the analysis of high-throughput sequencing data enables all of us to push the boundaries of precision medicine. The BROAD’s Genome Analysis Toolkit (GATK) is the industry standard software package for variant discovery and genotyping. In this luncheon, experts from the BROAD and Intel will discuss the exciting new capabilities that are coming to GATK, and the impact that this could have on the industry.

1:10 Luncheon Presentation II: Cloud Bursting HPC Workloads: Challenges and Opportunities

Dan Chow, COO/CTO, Silicon Mechanics

Feeling constrained by your HPC cluster? Are there times that you need more capacity or to offload some storage? Bursting to the public cloud offers you an alternative to grow with added flexibility. Dan will share about the benefits our customers have experienced and cover some of the pitfalls to be wary of when evaluating how to implement cloud bursting.

1:40 Session Break

1:50 Chairperson’s Remarks

Sanjay Joshi, CTO, EMC

1:55 High-Performance Computing Clusters and Storage Enabling Big Data Genomic Analyses Outcomes across Research and Clinical Domains: Implementations, Operations, and Lessons Learned

Jason Hughes, MBA, MS, Director, Enterprise Research Applications & High-Performance Computing, Penn Medicine Academic Computing Services, Perelman School of Medicine at the University of Pennsylvania

Storage and analysis of large datasets is a growing need for academic researchers, as is the analysis of genomic data in the pursuit of personalized medicine. In 2012, The Perelman School of Medicine at the University of Pennsylvania made a capital investment in a centrally supported HPC environment. Housing 2PB of disk storage, 1.8PB of archive storage, and over 4,500 computing cores, this HPC is available to faculty, staff, students, and clinicians. This presentation will review the three-year history of the HPC environment, the technical, administrative, and financial constructs within which these services are provided, lessons learned in the areas of data storage and management, and how HPC storage and compute capabilities are enabling the tri-part organizational mission of education, research, and clinical care.

2:25 Low Cost Data Management System for Large Scale Ion Sequencing Systems

Mohamed Abouelhoda, Ph.D., Head, Bioinformatics, Saudi Human Genome Project, Genetics Department, King Faisal Specialist Hospital and Research Center

Storage infrastructure is a major problem when running NGS based sequencing projects. Reducing storage requirements means cost and effort reduction. Our presentation will provide a solution to an urgent demand for genome sequencing projects, and it will be appealing for both project owners and IT specialists. We will also discuss best practices achieved in handling thousands of sequencing runs in the Saudi Genome Project.

2:55 How Web-Scale Storage is Enabling Faster, Efficient Medical Research Collaboration for More Effective Patient Treatments

Piers Nash, Ph.D., Director, Business Development and Outreach, University of Chicago

Learn how the University of Chicago’s Center for Data Intensive Science (CDIS) accelerates medical discoveries by democratizing access to data for scientific research. Utilizing an object storage solution, CDIS centrally stores and manages vast amounts of genomic and clinical data at web-scale, allowing researchers to collaborate via shared access to harmonized data sets, speeding discovery and enabling precision medicine.

3:10 Life Science - Fast & Slow

Patrick Combes, Principal Solution Architect, Life Science & HPC, EMC

The handling of data from Life Science workflows can be characterized as fast and slow. While no single storage or computing technology can address the entire continuum of fast-n-slow requirements, EMC has introduced several new products and made significant enhancements to existing lines in the past few years to cover as much as possible. We will highlight these new developments for HPC, converged infrastructure and research data management and illustrate how they can be applied to Life Science applications and workflows.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing

4:00 Panel Discussion: The IUPAC InChI Standard for Large Molecules

Steve Heller, Ph.D., Project Director, InChI Trust; Scientific Information Consultant (Chairperson/Moderator)

Evan Bolton, Ph.D., Lead Scientist, National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), and National Institutes of Health (NIH)

Keith T. Taylor, BSc, Ph.D., MRSC, Principal, Ladera Consultancy

Tyler Peryea, Informatics Scientist, National Center for Advancing Translational Sciences (NCATS)

Lawrence Callahan, Ph.D., Chemist, Substance Registration System, Office of Critical Path Programs, Food and Drug Administration (FDA)

This session will present the ongoing work of IUPAC and the US Government agencies involved in the development of a standard method for describing large molecules, often called biologics, to allow for easier linking of diverse sources of data and information about these molecules. The term biologics in regard to this work means chemically modified amino acid sequences, nucleic acid sequences, carbohydrates, lipids or any combination of these. These large molecules are a rapidly growing portion of the chemical substance descriptions and bioactivity data bring made available online by many diverse and valuable resources.

5:00 Scale, Speed, Smart — IBM Genomics Reference Architecture

Frank Lee, Ph.D., Global Healthcare Life Sciences Industry Leader, IBM

Dr. Frank Lee will share the IBM Genomics Reference Architecture as an open and innovative platform to address the explosive growth of data from genomics research, drug discovery and clinical application. With seamless integration of workload orchestration and data management, the architecture is designed to handle large amounts of data and job throughput, yet flexible enough to be deployed on-premise, in the cloud or as a hybrid. A demo will be shown for a genomics pipeline in operation from a hybrid cloud.

5:30 – 6:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

Thursday, April 7

7:00 am Breakfast Presentation: Enabling Technology. Leveraging Data. Transforming Precision Medicine.

Panelists: Sanjay Joshi, CTO, Life Sciences, EMC

Angel Pizarro, Director, Scientific Computing, Amazon Web Services

Ari Berman, Ph.D., General Manager, Government Services, Principal Investigator, BioTeam, Inc.

Eric A. Stahlberg, Ph.D., Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research (FNLCR)

Through collaborations, research and innovation, Intel is supporting the advancement of processing, storage, networking, data security, sequencing efficiency, accelerated bioinformatics and advanced analytics—to push the boundaries of this new “precision medicine” and bring us closer than ever to truly making care personal. Listen to this panel discuss how technology and bioscience are coming together to accelerate precision medicine through on-premise and cloud based solutions.

8:00 PLENARY KEYNOTE SESSION PANEL

10:00 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced

10:30 Chairperson’s Opening Remarks
Sanjay Joshi, CTO, EMC

10:40 FEATURED PRESENTATION: HPC Trends in the Trenches 2016

Chris Dagdigian, Founding Partner & Director, Technology, BioTeam, Inc.

In one of the most popular presentations of the Expo, Chris delivers a candid assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. He’ll cover what has changed (or not) in the past year around infrastructure, storage, computing, and networks. This presentation will help you understand IT to build and support data intensive science.

11:40 Realize a Fiftyfold Increase in Sequencing by Combining Performance Scale-Out Storage with the Latest Next-Gen Sequencers

David Sallak, Vice President, Products & Solutions, Panasas

In this talk, you will learn how to easily harness and manage data by deploying scale-out storage that accelerates workflows and brings plug-and-play simplicity to data management. Panasas customer Garvan Institute of Medical Research accomplished a 50X increase in their sequencing capabilities after combining the Illumina HiSeq X Ten sequencer with Panasas ActiveStor performance scale-out NAS.

11:55 How Next Generation Scale-Out Storage Is Enabling the Next Frontier of Life Sciences Breakthroughs
Joel Groen, Product Manager, Qumulo

With major technology advances in genomic IT, data is being created at a faster rate than ever before – creating massive storage and data management challenges for Life Sciences and bioinformatics organizations that are tasked with managing hundreds of millions to trillions of files. Enter next-generation scale-out storage – which provides real-time answers about data footprints at incredible scale, abstracts away the underlying infrastructure, and achieves breakthrough performance using intelligent software and commodity hardware – all while balancing performance, capacity and cost.

12:10 pm Session Break

12:20 Luncheon Presentation I: Object Storage: Enabling Genomic Sequencing at Petabyte Scale

Joe Arnold, President and Chief Product Officer, Leadership, SwiftStack

The audience will learn the following from our presentation: 1) How incorporating multi-petabyte storage-as-a-service into research environments can be cost-efficient, scalable and manageable; 2) How to implement an open source object storage system that keeps up with data volume while improving data management and organization by using arbitrary tags and metadata; and 3) How chargebacks can determine storage user behavior.

12:50 Luncheon Presentation II: Searching through Petabytes of Data to Find What You Actually Want

Kiran Bhageshpur, CEO, Igneous Systems

Just a decade ago, large data sets were still measured in the TB’s and PB data sets were rare. In today’s world, even a modest laboratory can generate petabytes of data that needs to be ingested, processed, curated and stored for decades. Yet, our ways of interacting with these large data sets remain mired in tools and techniques build for MB data sets. Surely there has to be a better way?

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing

1:55 Chairperson’s Remarks

Eric A. Stahlberg, Ph.D., [Contractor], High-Performance Computing Strategy, Data Science and Information Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research (FNLCR)

2:00 Panel Discussion: Actionable Big Data Analytics

Moderator: Eric A. Stahlberg, Ph.D., Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research (FNLCR)

Kiran Bhageshpur, CEO, Igneous Systems, Inc.

David King, CEO, Exaptive

Timothy Danford, Ph.D., Field Engineer, Tamr, Inc.

Leading life sciences experts will discuss trends and best practice case studies of turning big data into smart data that can lead to real time assistance in organization decision making, disease prevention, prognosis, diagnostics, and therapeutics. Learn how and where these organizations have assembled and analyzed information from different data ‘silos’ and deployed solutions to make decisions. We’ll discuss technology tools used to move data and information from retrospective reporting to real-time predictive analytics. Walk away hearing practical steps, solutions, and capabilities that you can implement within your own organization.

3:30 Future Convergence

Eric A. Stahlberg, Ph.D., Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research (FNLCR)

4:00 Conference Adjourns