Track 2 - Data Computing 2016 Bio-It World Expo

2016 Archived Content

OVERVIEW | DOWNLOAD BROCHURE | WORKSHOPS

Track 2 - April 5– 7, 2016

Data Computing

Advances in Computing Application for Big Data

Tackling big data issues that researchers and scientists in genomics and the life sciences are focused on requires an increased demand in networking and computing power. Track 2 explores techniques and new methods of data storage, transfer and workflows. Themes covered include but aren’t limited to application portability, reproducibility, local vs. cloud computing, extreme computing, moving computing vs. moving data, meta computing, and high-performance computing. There will be a joint session with this track and one of the other tracks to discuss common issues including converged infrastructure and networking.

Tuesday, April 5

7:00 am Workshop Registration and Morning Coffee

8:00 – 11:30 Recommended Morning Pre-Conference Workshops* Creating a Best of Breed Informatics Environment for Your Organization

12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops* Data Science Driving Better Informed Decisions

* Separate registration required

2:00 – 6:00 Main Conference Registration

4:00 PLENARY KEYNOTE SESSION

5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster Viewing

Wednesday, April 6

7:00 am Registration Open and Morning Coffee

8:00 PLENARY KEYNOTE SESSION

9:00 Benjamin Franklin Awards and Laureate Presentation

9:30 Best Practices Awards Program

9:45 Coffee Break in the Exhibit Hall with Poster Viewing

10:50 Chairperson’s Opening Remarks

Claire Giordano, Senior Director, Emerging Storage Markets, Quantum

11:00 Advancing Translational R&D - Clinical Image Management

David Witt, Imaging Biomarkers Informatics Lead, Bristol-Myers Squibb

Medical imaging plays an ever increasing role in drug development. As innovative imaging technology continues to evolve, online access to medical images facilitates the rapid delivery of quantitative information to foster decision making at all stages of translational drug development. Incorporating a clinical trial Medical Image Management System (MIMS) into the drug development platform requires the re-examination of existing workflows to maximize the qualitative and quantitative benefits realized with MIMS. Consideration must be given to all aspects of the imaging strategy in order to create a definitive paradigm shift. This talk will present improved workflows and underlying technology challenges and opportunities with advancing translational R&D using high-quality clinical image management. A summary of lessons learned in multiple areas such as image transfer, metadata and collaboration management using clinical and pre-clinical information and systems will be presented.

11:30 Intercorporate Delivery of NGS Analysis Pipelines in Software Containers

Satu Nahkuri, Ph.D., Data Scientist, Pharma Research and Early Development Informatics, Roche Innovation Center Basel

Software containers such as Docker and CoreOS Rocket have in the past few years gained popularity among cloud providers for setting up PaaS / IaaS (Platform as a Service, Infrastructure as a Service). We have adopted containers for a different purpose, i.e., delivering our preferred next-generation sequencing (NGS) analysis tools to our collaborators' computing environments. Our solution allows us to secure consistent computing workflows and reproducible results with minimal reconfiguration burden. We anticipate that in the future, container and unikernel technologies will facilitate a novel computing paradigm, where NGS analysis and visualization pipelines are mobile, while NGS data remains stationary.

12:00 pm Pushing the Limits of Discovery with Internet2 - Cloud to Supercomputing in Life Sciences

Dan Taylor, Director, Business Development, Network Services, Internet2

Advances in life sciences rely on both world class collaboration and an ecosystem of secure cloud services and supercomputing seamlessly connected by a high-performance network. Learn how organizations are leveraging commercial clouds such as AWS, private big data scientific research clouds, supercomputing resources such as NCSA and San Diego Supercomputing, and dynamic combinations of these tactics to advance life science research with Internet2.

12:15 Managing NGS Data: Smaller is Better!

Rafael Feitelberg, CEO, Geneformics

The tremendous growth of NGS data is a blessing and a curse, leading to increasing pain in management and requiring escalating investments in infrastructure. We will review how organizations are reducing data volumes by up to 10X - on-premises and in the cloud - without any change to their workflow.

12:30 Session Break

12:40 Luncheon Presentation I: Genome Analysis Pipelines, Big Data Style

Allen Day, Principal Data Scientist, MapR Technologies

Bioinformatics workflow requirements are well-matched to BigData tools' capabilities. However Spark, for example, is not commonly used because many bioinformatics tool authors assume a legacy computing environment will be used. Barriers are quickly coming down. We'll examine a few conventional bioinformatics analyses and show how they can be modernized to save time, money, and make new types of analysis possible.

1:10 Luncheon Presentation II: Cover Your Bases: 7 Ways Genomics Workflows Can Benefit From Multi-Tier Storage

Claire Giordano, Senior Director, Emerging Storage Markets, Quantum

Dramatic declines in the cost and run times for genome sequencing are enabling bioinformaticians to do more, faster. But these advances come with a challenge—how to manage all of this valuable data? Quantum’s Claire Giordano explores how multi-tier storage (including object storage) can help genomics researchers accelerate time to discovery, improve access for distributed teams, and cost-effectively keep sequenced genome data for decades.

1:40 Session Break

1:50 Chairperson’s Remarks

Chris Dwan, Acting Director, IT, Broad Institute

1:55 To the Cloud(s): Broad Institute’s Journey Outside of Our Walls

Chris Dwan, Acting Director, IT, Broad Institute

2:25 Handling Cloud Project
Gurpreet Kanwar, Senior Project Manager, Information Management, NAV Canada

2:55 How Bluebee & Others Solve the File Exchange Problem for Bioinformatics

Michelle Munson, President, CEO & Co-Founder, Aspera, an IBM Company

Hans Cobben, CEO, Bluebee

As new research techniques create terabytes of NGS data, the need to quickly, easily, and securely ingest and exchange large genome data files with the cloud’s scale-up capacity becomes critical. Learn how Bluebee and other bioinformatics companies overcome this challenge by integrating or using Aspera FASP technologies and solutions to securely move large files at high-speed to and from multiple cloud and on-premise storage systems, regardless of where the data is located.

3:25 Refreshment Break in the Exhibit Hall with Poster Viewing

4:00 The Matchmaker Exchange: A Platform for Rare Disease Gene Discovery

Anthony Philippakis, M.D., Ph.D., Chief Data Officer, Broad Institute

4:30 Using Cloud Platforms for Consumer-Driven Integration of Research and Operations

Jonas Almeida, Ph.D., Professor & CTO, Department of Biomedical Informatics, Stony Brook University (SUNY)

5:00 Managing the Mayhem: Overcoming the Challenges of Long-Term Data Retention

David Hiatt, Marketing, Vertical Marketing, Health and Life Sciences, HGST

Data volumes continue to grow and retention periods lengthen—whether by need or by mandate—so researchers and IT leaders face increasingly difficult decisions about how to meet long-term retention requirements yet keep the storage budget in check. Learn practical methods for managing the mayhem and making sure that more research dollars go to research rather than to infrastructure.

5:30 – 6:30 Best of Show Awards Reception in the Exhibit Hall with Poster Viewing

Thursday, April 7

7:00 am Registration and Morning Coffee

8:00 PLENARY KEYNOTE SESSION PANEL

10:00 Coffee Break in the Exhibit Hall and Poster Competition Winners Announced

10:30 Chairperson’s Opening Remarks
Sanjay Joshi, CTO, EMC

10:40 FEATURED PRESENTATION: HPC Trends in the Trenches 2016

Chris Dagdigian, Founding Partner & Director, Technology, BioTeam, Inc.

In one of the most popular presentations of the Expo, Chris delivers a candid assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. He’ll cover what has changed (or not) in the past year around infrastructure, storage, computing, and networks. This presentation will help you understand IT to build and support data intensive science.

11:40 Realize a Fiftyfold Increase in Sequencing by Combining Performance Scale-Out Storage with the Latest Next-Gen Sequencers

David Sallak, Vice President, Products & Solutions, Panasas

In this talk, you will learn how to easily harness and manage data by deploying scale-out storage that accelerates workflows and brings plug-and-play simplicity to data management. Panasas customer Garvan Institute of Medical Research was able to increase their sequencing capacity to 50 genomes per day on average, without adding staff- a fiftyfold improvement, after combining the Illumina HiSeq X Ten sequencer with Panasas ActiveStor high-performance storage. With Panasas, Garvan was able to streamline their workflow by keeping the sequencing data in the central repository throughout the analysis, resulting in faster delivery of results to researchers around the world.

11:55 How Next Generation Scale-Out Storage Is Enabling the Next Frontier of Life Sciences Breakthroughs

Joel Groen, Product Manager, Qumulo

With major technology advances in genomic IT, data is being created at a faster rate than ever before – creating massive storage and data management challenges for Life Sciences and bioinformatics organizations that are tasked with managing hundreds of millions to trillions of files. Enter next-generation scale-out storage – which provides real-time answers about data footprints at incredible scale, abstracts away the underlying infrastructure, and achieves breakthrough performance using intelligent software and commodity hardware – all while balancing performance, capacity and cost.

12:10 pm Session Break

12:20 Luncheon Presentation I: Object Storage: Enabling Genomic Sequencing at Petabyte Scale

Joe Arnold, President and Chief Product Officer, Leadership, SwiftStack

The audience will learn the following from our presentation: 1) How incorporating multi-petabyte storage-as-a-service into research environments can be cost-efficient, scalable and manageable; 2) How to implement an open source object storage system that keeps up with data volume while improving data management and organization by using arbitrary tags and metadata; and 3) How chargebacks can determine storage user behavior.

12:50 Luncheon Presentation II: Searching through Petabytes of Data to Find What You Actually Want

Kiran Bhageshpur, CEO, Igneous Systems

Just a decade ago, large data sets were still measured in the TB’s and PB data sets were rare. In today’s world, even a modest laboratory can generate petabytes of data that needs to be ingested, processed, curated and stored for decades. Yet, our ways of interacting with these large data sets remain mired in tools and techniques build for MB data sets. Surely there has to be a better way?

1:20 Dessert Refreshment Break in the Exhibit Hall with Poster Viewing

1:55 Chairperson’s Remarks

Eric A. Stahlberg, Ph.D., [Contractor], High-Performance Computing Strategy, Data Science and Information Technology Program, Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research (FNLCR)

2:00 Panel Discussion: Actionable Big Data Analytics

Moderator: Eric A. Stahlberg, Ph.D., Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research (FNLCR)

Kiran Bhageshpur, CEO, Igneous Systems, Inc.

David King, CEO, Exaptive

Timothy Danford, Ph.D., Field Engineer, Tamr, Inc.

Leading life sciences experts will discuss trends and best practice case studies of turning big data into smart data that can lead to real time assistance in organization decision making, disease prevention, prognosis, diagnostics, and therapeutics. Learn how and where these organizations have assembled and analyzed information from different data ‘silos’ and deployed solutions to make decisions. We’ll discuss technology tools used to move data and information from retrospective reporting to real-time predictive analytics. Walk away hearing practical steps, solutions, and capabilities that you can implement within your own organization.

3:30 Future Convergence

Eric A. Stahlberg, Ph.D., Leidos Biomedical Research, Inc., Frederick National Laboratory for Cancer Research (FNLCR)

4:00 Conference Adjourns