W12: Data Science Driving Better Informed Decisions
Tuesday, April 5, 2016 | 12:30 – 4:00 pm
This course will highlight how data science is succeeding in helping Pharma organizations make data driven decisions to gain efficiencies and let companies grow their research programs effectively. This workshop will include case studies from several organization showcasing how data analysis tools are developed and used to better support the data scientist in the decision process. You will learn how to bridge between the worlds of data scientists and bench researchers and see how to prepare integrated systems designed for increased externalization and incorporation of new science and technologies.
12:30pm Chairperson
James Cai, Head Data Science, Pharmaceutical Research and Early Development Informatics, Roche Innovation Center New York
12:40 Large-Scale Analysis of RNA-Seq Data to Find Actionable Gene Fusions
Francesca Milletti, Ph.D., Principal Scientist, Data Science, Roche Innovation Center New York
Gene fusions have been successfully targeted to treat a number of malignancies, but they have not been fully characterized across human cancers. Here we present results from our analysis of gene fusion prevalence for 27 cancer types in the Cancer Genome Atlas (TCGA) using RNA-seq data for over 10,000 patients. We found that gene fusions previously described in the literature affect 14% of the global TCGA population, with more than half recurring in at least two tumor types. A number of gene fusions are associated with a distinct clinical phenotype, such as younger age and shorter overall survival, and with a distinct gene expression signature. These results expand our understanding of gene fusion events and have implications for the discovery of new drug targets and biomarkers.
1:10 Harnessing Big Genomic Data Using Apache Spark and Impala
Wei-Yi Cheng, Ph.D., Scientist, Roche Innovation Center New York, Roche TCRC Inc.
The Hadoop ecosystem has been shown to be very powerful for managing and analyzing web-scale data. In the biopharmaceutical industry, however, it has not been well recognized or utilized in its full potential. In this workshop, we will demonstrate some use cases of Spark and Impala for managing big genomic data. The audience will get first-hand experience of using Spark and Impala on example data sets.
1:40 Data Science Enables Better Decisions Driven by Clinical Visualizations
Philip C Ross, Ph.D., Director, Data Sciences TR&D, BMS
Clinical trials generate large amounts of data that require rapid review to identify emerging signals and trends. These signals and trends are most effectively identified using visualizations. Using the information revealed in the visualizations, clinicians and scientists can make faster, better informed decisions regarding conduct of each clinical trial. Based on these insights, key hypotheses can be identified for additional studies. The integration of this data and knowledge across a clinical program drives the development of new treatments that will be beneficial for patients.
2:10 Networking Coffee Break
2:30 Custom Scientific Visualizations in TIBCO Spotfire for Better Informed Decisions
Christian Blumenroehr, Ph.D., Information Scientist, Roche Innovation Center Basel, F. Hoffmann-La Roche
TIBCO Spotfire is a very powerful data analysis and visualization tool. Although it comes with a rich set of visualization charts, those might sometimes not be enough for specific needs. We would like to show ways and examples how this set can be extended to better support the data scientist in the decision process.
3:00 Bridging the Gap: Bringing Powerful Scientific Calculation Engines into the Hands of Research Scientists
Nils Weskamp, Principal Scientist, Computational Chemistry, Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co KG
At BI, we recently implemented the "Computational Chemistry Framework (CCFW)" - a SOA infrastructure for the flexible, robust and scalable deployment of complex scientific calculation engines (e.g., molecular propery calculators, predictive QSAR models, basic tools for structure-based design) from our high-performance computing environment to the desktops of our scientists (covering e.g. tools like Marvin Sketch, MOE, D360, etc. as frontends). CCFW ensures a consistent and convenient access to key decision parameters across sites and functions and also compliance with internal data access rules and complex software license restrictions. It therefore serves as a bridge between the worlds of data scientists and bench researchers.
3:30 ARIAD Research and Discovery Informatics Roadmap: Towards a Fully Integrated Laboratory Environment
Anna Kohlmann, Ph.D., Associate Director, Discovery Informatics and Computational Chemistry, Research Technologies, ARIAD Pharmaceuticals, Inc.
John F. Conway, Global Director R&D Strategy and Solutions, LabAnswer
ARIAD Pharmaceuticals has embarked on a journey to rejuvenate its laboratories, informatics systems and IT backbone. Creating a fully integrated physical and electronic laboratory environment will promote collaboration and further enable workplace mobility, both fundamentals of a modern workplace culture. To achieve this transformation, the company invested in scientific and business blue printing and process mapping due diligence, data governance and stewardship initiatives, and implementation of a true electronic laboratory environment. A series of layered and integrated systems are designed to prepare ARIAD for future business process change, increase externalization and incorporate new science and technologies, as well as to gain efficiencies that will let ARIAD grow its research programs effectively. Together with LabAnswer we will describe the evolution of the Electronic laboratory environment, data supported decisions and processes that enable a successful oncology and biotechnology company to transform and thrive while protecting its greatest assets -- its employees and its data.
4:00 Close of Workshop
Instructors
James Cai, Head Data Science, Pharmaceutical Research and Early Development Informatics, Roche Innovation Center New York
Dr. James Cai is the Head of Data Science at the Roche Innovation Center in New York. Dr. Cai received his Ph.D. in Molecular Biology from Cornell University and a Master’s degree in Biomedical Informatics from Columbia University. He also worked as a National Library of Medicine postdoctoral fellow in Biomedical Informatics at Columbia University, where he developed new methods in both bioinformatics and clinical informatics research. He started his pharmaceutical career at Roche as a bioinformatics scientist, and has since worked in many scientific and management roles responsible for various aspects of informatics support in drug discovery and development, e.g., genomic data analysis, algorithm development, enterprise system development, Next-generation sequencing, text analytics and data mining. He was responsible for the development of a number of scientific applications widely used at Roche. Since 2013 Dr. Cai played a leading role in the establishment of Data Science capabilities at the new Roche Innovation Center in New York. His group currently provides broad Data Science services including data management, analytics and biological data interpretation that enable clinical researchers and translational scientists to make data-driven decisions.
Francesca Milletti, Principal Scientist, Data Science, Roche Innovation Center New York
Wei-Yi Cheng, Ph.D., Scientist, Roche Innovation Center New York, Roche TCRC, Inc.
Wei-Yi Cheng is a data scientist at Roche TCRC, New York. He specialized in genomic data analysis with concentration on disease risk modeling. He also helps develop data analysis and visualization software packages that address the needs from drug projects. Wei-Yi has received his Ph.D. degree in Electrical Engineering from Columbia University in the city of New York, where his research focuses on development of genome-scale data mining algorithms for biological discovery and predictive modeling.
Christian Blumenroehr, Ph.D., Information Scientist, Roche Innovation Center Basel, F. Hoffmann-La Roche
After his PhD in Computer Science, Christian Blumenroehr worked for many years as web application expert and IT architect before he joined F. Hoffmann-La Roche 5 years ago where he now works as a Software Engineer, Architect and Technical Project Manager in the Roche Innovation Center Basel (Pharma Research and Early Development).
Philip C. Ross, Ph.D., Director, Data Sciences TR&D, BMS
Phil Ross works at Bristol-Myers Squibb as Director of Data Sciences TR&D. During his 5 years at BMS, his team has delivered dynamic clinical visualizations incorporating clinical and biomarker data for clinical teams and Translational Research & Development (TR&D) teams. Phil developed Clinical Review at BMS, a Spotfire application supporting near-real-time visualizations of clinical data for all ongoing in-house clinical trials at BMS. He previously worked at Pfizer as a Director of Informatics and was a postdoctoral researcher at the University of Virginia. Phil earned a Ph.D. in Pharmacology and a M.S. in Organic Chemistry at The Ohio State University.
Nils Weskamp, Principal Scientist, Computational Chemistry, Lead Discovery, Boehringer Ingelheim Pharma GmbH & Co KG
Nils Weskamp is a Principal Scientist, Computational Chemistry at Boehringer Ingelheim. As part of that role, he supports drug discovery research projects at all stages. He utilizes methods from Chemoinformatics, Data Mining, Predictive Analytics and related disciplines to extract knowledge from the wealth of experimental data that is available internally and externally. Recently, he led a global IT project aimed at bringing complex scientific calculation engines from the high-performance computing environment into the hands of BI’s scientists. Nils contributes to the alignment of BI's Research and IT functions in a number of roles. He holds a master’s degree and a PhD in Computer Science (with a focus on Bioinformatics and Data Mining) from the University of Marburg, Germany.
Anna Kohlmann, Ph.D., Associate Director, Discovery Informatics and Computational Chemistry, Research Technologies, ARIAD Pharmaceuticals, Inc.
Anna Kohlmann received her Ph.D. in chemistry from the University of North Carolina at Chapel Hill. She spent six years as a computational chemist at Sanofi before joining ARIAD in 2008, where she has recently become involved in the Discovery Informatics initiative.
John F. Conway, Global Director R&D Strategy and Solutions, LabAnswer
John comes to LabAnswer with over 20 years of R&D experience. Most recently, he was he was Vice President of Professional Services and Solutions at Schrödinger, LLC. Previously, John was Global Sr. Director of Scientific Informatics and Solutions at Accelrys, Inc. Before he joined Accelrys, he was the Philadelphia Site Head, Global Chair of the Structural Biology Domain, for the Discovery Informatics Department at GlaxoSmithKline, Inc. Prior to GSK, John spent many years at Merck and Co., Inc. with varying roles in biological and chemical informatics as well as computational science methods and modeling. John’s early career includes serving as an analytical biochemist at Tektagen, Inc. (now Charles River Laboratories), a Senior Forensic Scientist at the Pennsylvania State Police, and Co-Founder of Avecon Inc., a diverse diagnostics company.