Data and Metadata Management

With the increased demand in computing power from life science researchers and scientists tackling big data issues, storage and infrastructure must be able to scale to handle billions of data points and files efficiently. The problem is administration of data to ensure information can be integrated, accessed, shared, linked, analyzed, and maintained to best effect across the organization. The Data and Metadata Management track will explore how to manage workflows with data and metadata without rerunning everything, but with the ability to handle data updates and new versions of the software. We will also explore how to associate the processed data and features with the raw data for analysis purposes.

Tuesday, October 6

PLENARY KEYNOTE PROGRAM

10:00 am

Welcome Remarks

Cindy Crowninshield, Executive Event Director, Cambridge Healthtech Institute
Scott Parker, Director of Product Marketing, Marketing, Sinequa
10:15 am

NIH’s Strategic Vision for Data Science

Susan K. Gregurick, PhD, Associate Director, Data Science (ADDS) and Director, Office of Data Science Strategy (ODSS), National Institutes of Health
Rebecca Baker, PhD, Director, HEAL (Helping to End Addiction Long-term) Initiative, Office of the Director, National Institutes of Health
11:05 am

LIVE Q&A: Session Wrap-Up Panel Discussion

Panel Moderator:
Ari E Berman, PhD, CEO, BioTeam Inc
11:25 am Lunch Break - View Our Virtual Exhibit Hall
11:55 am Recommended Pre-Conference Workshops*
W1: Data Management for Biologics: Registration and Beyond
W2: A Crash Course in AI: 0-60 in Three
W3: Data Science Driving Better Informed Decisions

*Separate registration required. See workshop page for details.

1:55 pm Refresh Break - View Our Virtual Exhibit Hall
2:15 pm Recommended Pre-Conference Workshops*
W4: Digital Biomarkers and Wearables in Pharma R&D and Clinical Trials
W5: AI-Celerating R&D: Foundational Approaches to How Emerging Technologies Can Create Value
W6: Dealing with Instrument Data at Scale: Challenges and Solutions

*Separate registration required. See workshop page for details.

4:15 pm Close of Day

Wednesday, October 7

DATA ECOSYSTEM TO ACCELERATE DATA DISCOVERY AND SHARING FOR BIOMEDICAL RESEARCH

9:00 am

The Chicagoland COVID-19 Commons: A Regional Data Commons Powering Research to Support Public Health Efforts

Matthew Trunnell, Data Commoner-at-Large; Executive Director, Pandemic Response Commons; Former Vice President and Chief Data Officer, Fred Hutchinson Cancer Research Center

TOOLS, MODELS AND PROCESSES FOR STRUCTURING AND MANAGING DATA

9:20 am

A New Compound Platform for Enhanced Access to Chemical Space for Screening

Michael Lange, ML/AI Lead, R&D Informatics, Small Molecule Discovery Informatics, Roche

Over the last years, the commercially available chemical space (with pharmaceutical relevance) has rapidly increased. Several providers today are offering catalogs consisting of several hundred millions of screening compounds. We built a new compound platform to enable browsing, searching, selection, and ordering of compound sets from these libraries. The platform offers these capabilities by standardizing and preprocessing all molecules, calculating relevant properties, and enabling access to these libraries by combining fast structure-based search with property and metadata filters. This presentation will present the overall architecture and highlight some of the challenges encountered during the implementation.

9:40 am

SeedMeLab - Data Management System: Search, Manage, Share, and Visualize Data

Amit Chourasia, Visualization Group Leader, University of California San Diego

SeedMeLab, a web-based data management system, comprises a modular set of building blocks that could be configured and customized in an extensible manner. It enables to host data on the web/intranet with access controls and pluggable identity management. The expressive and extensible file system allows data, its description, and its discussion to be collocated, which catalyzes discovery. It enables rich presentation and visualization that aids in making data more insightful. The built-in web services, coupled with extension API, make it a powerful platform to realize FAIR data compliance. Research teams or science applications could spin up SeedMeLab with their own branding and domain with customized file-system and presentation layout.

10:00 am Coffee Break - View Our Virtual Exhibit Hall

REAL-WORLD EVIDENCE (RWE) DATA MANAGEMENT STRATEGIES

10:20 am PANEL DISCUSSION:

Real-World Evidence (RWE): Data Provenance, Format, Ingest, Quality (Bias), Integration, Visualization, Transformation, Verification & Validation, and Implementation

Panel Moderator:
Sanjay Joshi, Industry CTO, Healthcare, Dell Technologies

The future of the intersection of healthcare and the life sciences will be data- and process-focused, not application- or software-focused. “Bringing the analytics to Data” is the challenge from an infrastructure and methods perspective. According to the FDA, Real-World Evidence (RWE) is defined as “the clinical evidence regarding the usage and potential benefits or risks of a medical product derived from analysis of Real-World Data (RWD): e.g., effectiveness or safety outcomes from an RWD source in randomized clinical trials or in observational studies.” Our topical, honest, and “real-world” panel will discuss the sources of RWD (EHR, Claims & Billing, Registries, Patient Reported Data, etc.) and their process implications for RWE and the future of clinical trials themselves.

Panelists:
Victoria A Gamerman, PhD, Head, US Health Informatics & Analytics, Boehringer Ingelheim Pharmaceuticals Inc
Kenna R Mills Shaw, PhD, Exec Dir, Institute for Personalized Cancer Therapy, MD Anderson Cancer Ctr
Kelly H Zou, PhD, VP & Head of Medical Analytics & Insights Research, R&D & Medical, Pfizer Inc
Kevin Dialdestoro, MPhil, Head of Data Science Consulting, Consulting Services, Genestack

As we embrace the multi-omics and single-cell era, organisations are striving to integrate and evolve existing systems to harness richer and bigger data/metadata. Drawing upon our experience with top pharma, agriscience and FMCG companies, we share lessons learned and technologies for transitioning from fragmented systems towards a FAIR data ecosystem.

Can "John" Akgun, PhD, VP of Business Development, Flywheel

Medical imaging is an important component of Life Science R&D, however, the incorporation of these data types and processing workflows into a digital transformation ecosystem is not trivial. At Flywheel we have developed a cloud-scale platform that allows for the capture, curation, processing and secure collaboration for large volumes of medical imaging data, associated metadata, and complimentary data types. Drawing upon our experience with top pharma, we share lessons learned when streamlining the ingestion of millions of data sets from siloed servers and external partners such as CROs and academic institutions.  Additionally, we illustrate automation, real-time data search & access and machine learning workflows.

11:30 am LIVE Q&A:

Session Wrap-Up Panel Discussion

Panel Moderator:
Sanjay Joshi, Industry CTO, Healthcare, Dell Technologies
Panelists:
Michael Lange, ML/AI Lead, R&D Informatics, Small Molecule Discovery Informatics, Roche
Kevin Dialdestoro, MPhil, Head of Data Science Consulting, Consulting Services, Genestack
Esteban Rubens, Healthcare AI Principal, NetApp
Matthew Trunnell, Data Commoner-at-Large; Executive Director, Pandemic Response Commons; Former Vice President and Chief Data Officer, Fred Hutchinson Cancer Research Center
Can "John" Akgun, PhD, VP of Business Development, Flywheel
Amit Chourasia, Visualization Group Leader, University of California San Diego
Kenna R Mills Shaw, PhD, Exec Dir, Institute for Personalized Cancer Therapy, MD Anderson Cancer Ctr
Victoria A Gamerman, PhD, Head, US Health Informatics & Analytics, Boehringer Ingelheim Pharmaceuticals Inc
Kelly H Zou, PhD, VP & Head of Medical Analytics & Insights Research, R&D & Medical, Pfizer Inc
11:50 am Lunch Break - View Our Virtual Exhibit Hall
11:55 am Interactive Breakout Discussions

Consider joining a breakout discussion group. These are informal, moderated discussions with brainstorming and interactive problem solving, allowing participants from diverse backgrounds to exchange ideas and experiences and develop future collaborations around a focused topic.

Michael Riener, President, RCH Solutions

Join us for a lively discussion among prominent pharma leaders, and learn:

Why, when & how to implement a public Cloud for your computing needs

Challenges and opportunities when setting and managing stakeholder expectations

Critical keys to success to realize the best outcomes

To learn more about RCH Solutions, visit our Virtual Booth

Joe Donahue, Managing Director, Life Sciences, Accenture

Hosted by Joe Donahue, Managing Director, Life Sciences, Accenture

 

Participants include: 

Andreas Matern, Head of Digital Translational Medicine, Sanofi

John Quackenbush, Professor of Computational Biology and Bioinformatics; Harvard T.H. Chan School of Public Health

Seungtaek Lee, VP, Strategic Partnerships and AI RWE Head of CoE; ConcertAI

Preston Keller, PhD, MBA, President & CCO, PercayAI

Philip Payne, PhD, Becker Professor and Chief Data Scientist, Washington University in St. Louis

 

Jeff Evernham, VP of Customer Solutions, Consulting, Sinequa

Most large scale analysis of clinical trial data only leverages part of the picture, ignoring unstructured data and limiting findability across all the information collected throughout multiple disparate data sources.  This roundtable will discuss leveraging a cognitive platform to combine all data from multiple sources into one unified view using a single entry point to the data.

 

Sasha Paegle, Life Science Business Development, Dell Technologies

Evaluating, optimizing and benchmarking of next generation sequencing (NGS) methods are essential for clinical, commercial and academic NGS pipelines. Optimizations for speed and accuracy often require making trade-offs relative to other constraints. Join this roundtable to discuss benchmarking strategies, trade-offs, and the value of benchmarking genomics tools and applications. 

PLENARY KEYNOTE PROGRAM

Michael Schwartz, Head, Product Marketing, Marketing, Benchling

The life science industry has forged ahead with a new generation of therapeutics. A new R&D paradigm is required to develop scientific platforms, manage data complexity, and orchestrate progress across specialized teams. Digital solutions and data ecosystems are at the heart of this, but require both structure and adaptability to thrive in the modern life science R&D environment.

12:30 pm KEYNOTE PRESENTATION & PANEL DISCUSSION:

Game On: How AI, Citizen Science, and Human Computation Are Facilitating the Next Leap Forward

Allison Proffitt, Editorial Director, Bio-IT World

While the precision medicine movement augurs for better outcomes through targeted prevention and intervention, those ambitions entail a bold new set of data challenges. Various panomic and traditional data streams must be integrated if we are to develop a comprehensive basis for individualized care. However, deriving actionable information requires complex predictive models that depend on the acquisition and integration of patient data on a massive scale. This picture is further complicated by new data streams emerging from quantified self-tracking and health social networks, both of which are driven by experimentation-feedback loops. Tackling these issues may seem insurmountable, but recent advancements in human/AI partnerships and crowdsourcing science adds a new set of capabilities to our analytic toolkit. This session describes recent work in online collective systems that combine human and machine-based information processing to solve biomedical data problems that have been otherwise intractable, and an information processing ecosystem emerging from this work that could transform the landscape of precision medicine for all stakeholders. Pietro will open with a framing talk, followed by short presentations from each panelist, ending with a moderated Q&A discussion by Allison with speakers and attendees. 

Panelists:
Seth Cooper, PhD, Assistant Professor, Khoury College of Computer Sciences, Northeastern University
Lee Lancashire, PhD, CIO, Cohen Veterans Bioscience
Pietro Michelucci, PhD, Director, Human Computation Institute
Jérôme Waldispühl, PhD, Associate Professor, School of Computer Science, McGill University
1:55 pm Refresh Break - View Our Virtual Exhibit Hall

SECURING YOUR DATA AND IMPLEMENTING STANDARDS AND CONTROLS TO MANAGE RISK

2:10 pm

Defending against the Persistence of Inevitability

Brian D. Bissett, IT Specialist, Hardware Engineering, IEEE USA

Most data breaches represent a systemic breakdown along multiple lines of both technical and human factors. While many factors can contribute to an unauthorized release, the effort necessary to protect against these factors is not equal. This discussion will be from a holistic viewpoint of many security breaches, the breakdowns in fundamental security concepts which lead to the breaches, and the factors of paramount consideration in protecting an enterprise.

2:30 pm

Data Security and Governance for Biopharma

Jyotin Gambhir, Founder & Managing Director, SecureFLO LLC

Governance provides a playbook for a biopharma company to manage security and privacy compliance. Good governance leads to a better managed goal and a focused IT environment. CyberHygiene today is critical for any company developing a drug or researching cures and trying to protect intellectual property, as well as subjects’ personal information. Regulations under FDA and FTC, as well as EU GDPR, can be complicated.

2:50 pm Refresh Break - View Our Virtual Exhibit Hall
3:10 pm

Dynamic Encryption and Watermarking of Genomic Sequencing Data to Facilitate Privacy-Preserving, Ownership-Based Data Governance

Xiaowu Gai, PhD, Director, Bioinformatics, Center for Personalized Medicine, Children's Hospital Los Angeles

To facilitate privacy-preserving, ownership-based data governance, we developed two novel algorithms which can be used to implement flexible fine-grained protection of genomic data: a) dynamic privacy-preserving encryption of user-specified genomic regions; and b) ownership and utility-preserving watermarking of the sequencing data. This empowers individuals to control when, for how long, and for what purpose any portion of their genomic data is shared, all in an auditable manner.

Dustin Harris, Principal Solutions Architect, Engineering, Igneous

File data is growing at 30% annually. From risk of data loss, to the certainty that data will grow faster than IT budgets, you can’t afford to be left behind. Transform your data management strategy with Igneous and stop risking your organization’s most valuable assets.

Curtis O'Dell, Sales Manager BI Solutions, Tricentis

The focus on preventative care and providing the “right” services & prescriptions is heavily being scrutinized and monitored. In response, the health industry is making data-driven decisions to become more strategic in their care—increasing cost-efficiency while improving outcomes. Sound Data Integrity testing provides a powerful way to eliminate data issues.

4:00 pm LIVE Q&A:

Session Wrap-Up Panel Discussion

Panel Moderator:
Brian D. Bissett, IT Specialist, Hardware Engineering, IEEE USA
Panelists:
Xiaowu Gai, PhD, Director, Bioinformatics, Center for Personalized Medicine, Children's Hospital Los Angeles
Dustin Harris, Principal Solutions Architect, Engineering, Igneous
Jyotin Gambhir, Founder & Managing Director, SecureFLO LLC
Curtis O'Dell, Sales Manager BI Solutions, Tricentis
4:20 pm Bio-IT Connects - View Our Virtual Exhibit Hall
5:00 pm Close of Day

Thursday, October 8

TRENDS FROM THE TRENCHES - 10TH ANNIVERSARY!

9:00 am KEYNOTE PRESENTATION & PANEL DISCUSSION:

Trends from the Trenches

Kevin Davies, PhD, Executive Editor, The CRISPR Journal; Founding Editor, Bio-IT World

The “Trends from the Trenches” will celebrate its 10th Anniversary at Bio-IT! Since 2010, the “Trends from the Trenches” presentation, given by Chris Dagdigian, has been one of the most popular annual traditions on the Bio-IT Program. The intent of the talk is to deliver a candid (and occasionally blunt) assessment of the best, the worthwhile, and the most overhyped information technologies (IT) for life sciences. The presentation has helped scientists, leadership, and IT professionals understand the basic topics related to computing, storage, data transfer, networks, and cloud that are involved in supporting data-intensive science. In 2020, Chris will give the “Trends from the Trenches” presentation in its original “state-of-the-state address” followed by guest speakers giving podium talks on relevant topics. An interactive Q&A moderated discussion with the audience follows. Come prepared with your questions and commentary for this informative and lively session. To stay connected with Trends from the Trenches updates after today and all year, sign up for BioTeam's newsletter here: https://bit.ly/33uO0OY 

Panelists:
Vivien R. Bonazzi, PhD, Managing Director & Chief Biomedical Data Scientist, Deloitte Consulting LLP
Tim Cutts, PhD, Head of Scientific Computing, Wellcome Sanger Institute
Chris Dagdigian, Senior Director, BioTeam Inc.
Kjiersten Fagnan, PhD, CIO, Data Science & Informatics, Lawrence Berkeley National Laboratory
Matthew Trunnell, Data Commoner-at-Large; Executive Director, Pandemic Response Commons; Former Vice President and Chief Data Officer, Fred Hutchinson Cancer Research Center
Paul Speciale, Chief Product Officer, Product Management, Scality

Biotechnology companies are facing new challenges in the amount of data that needs processing for genomics analysis. What used to be Terabytes of data is now petabytes of data and beyond. This data needs to be collected, analyzed, processed and then ultimately retained for compliance and research purposes - resulting in massive data storage and management challenges, unsolvable by legacy technology solutions. Our session will explain how to leverage new all-flash storage and hybrid-cloud solutions to make genomics analysis run quantum leaps faster than before.

10:55 am Session Break
11:30 am Lunch Break - View Our Virtual Exhibit Hall
11:35 am Interactive Breakout Discussions

Consider joining a breakout discussion group. These are informal, moderated discussions with brainstorming and interactive problem solving, allowing participants from diverse backgrounds to exchange ideas and experiences and develop future collaborations around a focused topic.

Timothy Gardner, CEO, Riffyn, Inc.

How do you use data / digitization today to drive scientific discovery / product development?

What are you greatest scientific pain points / gaps that are not being met by digitization?

What kinds of outcomes do you believe digital tools could help you achieve?

 

Scott Jeschonek, Principal Program Manager, Microsoft Azure

Welcome to this discussion group on the growth of demand for HPC in scientific research. We are looking forward to a lively forum. We'll start by looking at three related topics:

- What events trigger demand in your organization? How has the current pandemic impacted resources?

- What could make scale and collaboration more accessible to more researchers?

- Share a recent experience of shifting workloads to manage HPC capacity.

Greg DiFraia, General Manager, Americas, Executive Team, Scality
Shailesh Manjrekar, Head of AI and Strategic Alliances, Executive Team, WekaIO

In this session we’ll discuss how to provide researchers with performance and scale in genomics & research analytics, to drive results at a price point that’s economically viable on public & private cloud.

11:35 am

Breakout: NGS Pipeline Optimizations

Tristan J Lubinski, PhD, Sr Scientist, Next Generation Sequencing Informatics, AstraZeneca Pharmaceuticals; Co-organizer, Boston Computational Biology and Bioinformatics (BCBB)
Howard Marks, Technologist Extraordinary and Plenipotentiary, VAST Data

Storage solutions we’ve been using force bioinformaticists to make trade-offs between the capacity and low-cost of disk and the performance of flash. This results in complex tiering configurations that only deliver performance for a small slice of the data. In this session, we will review how advancements in technology enable VAST Data to revolutionize the cost of all-flash and allows bioinformatists faster analysis across larger datasets for deeper insights.

PLENARY KEYNOTE PROGRAM

12:00 pm

Welcome Remarks

Cindy Crowninshield, Executive Event Director, Cambridge Healthtech Institute
Juergen A. Klenk, PhD, Principal, Deloitte Consulting LLP
12:15 pm

Toward Preventive Genomics: Lessons from MedSeq and BabySeq

Robert C. Green, Professor & Director, G2P Research, Genetics & Medicine, Brigham & Womens Hospital
12:40 pm

AI in Pharma: Where We Are Today and How We Will Succeed in the Future

Natalija Z. Jovanovic, PhD, Chief Digital Officer, Sanofi
1:05 pm

LIVE Q&A: Session Wrap-Up Panel Discussion

Panel Moderator:
Vivien R. Bonazzi, PhD, Managing Director & Chief Biomedical Data Scientist, Deloitte Consulting LLP
Juergen A. Klenk, PhD, Principal, Deloitte Consulting LLP
1:25 pm Happy Hour - View Our Virtual Exhibit Hall
2:00 pm Close of Conference





Exhibit Hall and Keynote Pass

Data Platforms and Storage Infrastructure