Track 11 - April 5 – 7, 2016
Open Source Innovations
Integrated Informatics Solutions to Optimize Collaborative Biomedical Research
Track 11 presents case studies on collaborative and productivity software, platforms, tools, and models used to aggregate, harmonize, and interpret data from heterogeneous sources to accelerate basic, translational and clinical research. Speakers will show how crowdsourcing answers from networks is helping to empower transformative change by delivering life-saving medicine and information as quickly as possible.
Tuesday, April 5
7:00 am Workshop Registration and Morning Coffee
8:00 – 11:30 Recommended Morning Pre-Conference Workshops*
Security Considerations for Virtual Research
12:30 – 4:00 pm Recommended Afternoon Pre-Conference Workshops*
iConquerMS™: A Patient-Centered Research Model
* Separate registration required
2:00 – 6:00 Main Conference
Registration
4:00 PLENARY KEYNOTE SESSION
5:00 – 7:00 Welcome Reception in the Exhibit Hall with Poster
Viewing
Wednesday, April 6
7:00 am Registration Open and
Morning Coffee
8:00 PLENARY KEYNOTE SESSION
9:00 Benjamin Franklin Awards and Laureate Presentation
9:30 Best Practices Awards Program
9:45 Coffee Break in the Exhibit Hall with Poster Viewing
10:50 Chairperson’s Opening Remarks
Anil Srivastava, President, Open Health Systems Laboratory
11:00 Panel
Discussion: IUCKA: Indo-US Cancer Knowledge Alliance
Moderator: Anil Srivastava, President, Open Health Systems
Laboratory
Kenneth Buetow, Ph.D., Director of Computational Sciences and
Informatics, Complex Adaptive Systems Initiative (CASI), Arizona State
University
Rajendra Joshi, Ph.D., Associate Director and Head,
Bioinformatics Group, Centre for Development of Advanced Computing, Pune
University Campus
IUCKA: Indo-US Cancer Knowledge Alliance is being designed as
an integrated biomedical informatics cyberinfrastructure for cancer treatment
and research in India. It will be a true translational research platform from
bench to bedside connecting cancer treatment and research centers across the
country with access and connection to global centers of research, especially in
the United States. The promoters of the IUCKA are Arizona State University,
Open Health Systems Laboratory and Varian Medical Systems. IUCKA is being
implemented as a PPP (public private partnership) and is bringing together
technology products and service providers and cancer treatment and research
centers in an ecosystem to directly benefit cancer patients in India and
contribute to global research collaboration, especially between cancer centers
in India.
12:00 pm Managing Data Across the Research
Life-Cycle for Life Sciences
George Vacek, Global Director, Life Sciences, DDN
Dr. Vacek will deliver several in-depth case studies of
leading life sciences organizations leveraging high performance & high
scale data solutions for genomics, imaging & simulation workflows. Cases
will focus on implemented solutions: capturing & effectively exploiting
large scale data at speed, regulated & non-regulated stewardship
considerations, transitioning from non-scaling architectures & bringing the
benefits of high-end HPC technologies & techniques into smaller deployments
& collaborative scenarios.
12:15 Data Management in Large Scale Sequencing and Analysis
Kirill Malkin, Director, Storage Engineering, SGI
Next Generation Sequencing and its accompanying analyses are driving exponential growth in sequence data that needs to be stored, analyzed, and made accessible for future interrogations. This session presents a converged storage-and-analytics infrastructure framework based on SGI’s experience in enabling data-intensive supercomputing solutions – along with genomics customer case examples and best practices for simplifying the management of data sets that can contain billions of files/objects.
12:30 Session Break
12:40 Luncheon Presentation I: Accelerating the Analysis of High-Throughput Sequencing
Ketan Paranjape, General Manager, Life Sciences, Health and
Life Sciences, Intel
Panelists: Paolo Narvaez, Ph.D., Principal Engineer & Director, Personalized Care Platform, Intel Corporation
Adam Kiezun, Ph.D., Senior Group Leader, Computational Methods Development, Broad Institute of MIT and Harvard
Jeff Gentry, Principal Software Engineer, Broad Institute
Accelerating the analysis of high-throughput sequencing data enables all of us to push the boundaries of precision medicine. The BROAD’s Genome Analysis Toolkit (GATK) is the industry standard software package for variant discovery and genotyping. In this luncheon, experts from the BROAD and Intel will discuss the exciting new capabilities that are coming to GATK, and the impact that this could have on the industry.
1:10 Luncheon Presentation II: Cloud Bursting HPC Workloads: Challenges and Opportunities
Dan Chow, COO/CTO, Silicon Mechanics
Feeling constrained by your HPC cluster? Are there times that you need more capacity or to offload some storage? Bursting to the public cloud offers you an alternative to grow with added flexibility. Dan will share about the benefits our customers have experienced and cover some of the pitfalls to be wary of when evaluating how to implement cloud bursting.
1:40 Session Break
1:50 Chairperson’s Remarks
Christopher Southan, Ph.D., Database Curator, IUPHAR/BPS Guide to PHARMACOLGY, University of Edinburgh
1:55 MSSNG – An Open Science Approach to Facilitate Discovery
in Autism
Mathew Pletcher, Vice President & Head, Genomic Discovery,
Autism Speaks
Autism Speaks has undertaken an effort, entitled MSSNG, in
collaboration with Google and The Hospital for Sick Children to generate whole
genome sequence from at least 10,000 individuals from families with autism.
This genomic data has be made available along with associated clinical and
phenotypic data through multiple interfaces under the principles of open
science. MSSNG operates under the principle that best was to ensure the
delivery of new discoveries and tools to the autism community is to share this
valuable resource as broadly as possible and with as few restrictions as
possible.
2:25 An Open Embedded Live Image-Analysis Prototyping Platform
Patrick Oberthuer, Research Associate, Chair, Bioprocess
Engineering, Technische Universität Dresden
This talk will discuss the idea of open embedded low-cost
hardware platforms like the RaspberryPi and widely used open Image-Analysis
Platform ImageJ. This will be completed with live imaging devices. This will
fulfill the dream of easily prototyping any All-In-One image-analysis System.
2:55 The Case for Adaptive, Hierarchical Metadata
Stephen Worth, Director, Engineering, EMC
Groups maintaining data repositories at the petabyte-scale are discovering that cataloguing associated metadata is necessary to properly access and analyze data. To be successful they depend on researchers and data curators to provide the user-defined metadata. EMC recently contributed Metalnx to aid researchers with metadata management under iRODS. We will be demonstrating the principles of operation for Metalnx and discuss how adaptive, hierarchical metadata can be applied to research curation.
3:25 Refreshment Break in the Exhibit Hall with Poster Viewing
4:00 No ELN is an Island
Paul Whitehead, pRED Informatics Center Head, Roche
Research at Roche has an extended bench concept that utilizes
external scientists to contribute to internal projects. Externalization
requires, inter alia, reduced costs and shortened project life cycles to
justify its continued use. Externalized projects should be monitored, directed
and recorded using suitable planning, electronic laboratory notebook and
collaboration tools, and together with automated data exchange, be done quickly
and with high quality. The evaluation, selection, implementation and
integration of the cloud-based Dotmatics ELN for Roche Research will be
presented.
4:30 Between Open and Closed Antimalarial Drug Discovery:
Comparing Data Connectivity Gaps and Disclosure Speed
Christopher Southan, Ph.D., Database Curator, IUPHAR/BPS Guide
to PHARMACOLGY, University of Edinburgh
Antimalarial research is the poster child for Open Source Drug Discovery (OSDD). However many leads compounds still have their origins in Traditional Closed Drug Discovery (TCDD) and uncertainty remains as to the differences. To provide an assessment, this work examined 32 recent antimalarial structures in terms of their PubChem connectivity. Of these, 21 had patent matches, only 23 linked to publications and only 21 had BioAssay records. Major data connectivity problems included 1) leads not findable by code name, 2) patents not cited in publications 3) leads not reciprocally linked to Plasmodium protein targets and pathways 4) name-to-structures only being declared years after patent disclosure. These issues will be contrasted with the Sydney University Open Source Malaria approach were open lab books are used to surface structures (e.g. as Google-findable InChIKey) and crowdsourced collaboration data close to real time, thereby shaving years of the discovery phase.
5:00 Selected Poster Presentation: Embracing Ambiguity: Representation of Macromolecules Using the Enhanced Standard HELM 2.0
Markus Weisser, Ph.D., Managing Director, quattro research GmbH
Introduction HELM, the open standard, enables the representation of many types of complex macromolecules including nucleotides, proteins, antibodies and antibody-drug conjugates including ones containing non-natural elements. Created by Pfizer scientists, the Pistoia Alliance formalized the HELM notation as an open standard in early 2013 and publicly released software tools to the Open Source community. Since its release, HELM has attained widespread adoption and benefited from a growing range of global contributors. While HELM1.1 solves the problem of representing unnatural complex biomolecules, it still assumes that the scientist knows everything about the structure. In practice, however, there are a number of cases in which many structural features of a biomolecule are not known. This confronts scientists with a difficult choice: either pretend they have all the information and guess at a structure, or register a textual description with no structural information into their database. HELM 2.0 offers an extension to the HELM notation that allows a user to capture the available structural information while also identifying what is not known. The Pistoia Alliance partnered with quattro research to develop the toolsets that support this extension to the notation. Results The team implemented 3 major enhancements to the HELM definition and open source codebase: 1. The HELM notation and the HELM toolkit now support the representation of ambiguous macromolecules. 2. A new API allows the HELM toolkit to access different chemical libraries of the user's choice. Two libraries, ChemAxon's Marvin Beans and the Chemistry Development Kit (CDK), are currently available to the user. The chemistry plugin can be easily changed or extended to add support for additional chemical libraries. 3. Web-services for the toolkit abstract the toolkit functionality from the code implementation. Thereby, the toolkit can also serve as a client allowing the user to integrate monomer databases. Discussion With the addition of ambiguity support, HELM 2.0 now provides researchers the unique capability of representing complex biological entities that have not yet been fully characterized at the structural level, rendering it an even more practical technology for the electronic representation of a wide array of biomolecules. By additionally enabling the use of different chemical libraries and providing web-services, HELM is now more open and practical technology than ever before. The HELM code is available on GitHub and uses the permissive MIT open source license, which gives anyone the right to freely download and customize it. Please visit OpenHELM.org for additional information about the project.
5:30 – 6:30 Best of Show Awards Reception in the Exhibit Hall
with Poster Viewing
Thursday, April 7
7:00 am Registration and Morning Coffee
8:00 PLENARY KEYNOTE SESSION
10:00 Coffee Break in the Exhibit Hall and Poster Competition
Winners Announced
10:30 Chairperson’s Opening Remarks
Narges Bani Asadi, Founder and CEO Bina Technologies Inc., Roche Sequencing
10:40 Selected Poster Presentation: Nephele: A Cloud-Based Scientific Computing Platform for Improved Efficiency, Standardization, and Collaboration in Microbiome Data Analysis
Ian Misner, Ph.D., Computational Genomics Specialist and Contractor, Bioinformatics and Computational Biosciences Branch (BCBB), NIH/NIAD/OD/OSMO/OCICB
Nephele is a cloud computing platform for microbiome research, aimed at providing scientists with a consistent, centralized, and collaborative environment for high-throughput metagenomics data analysis. Growing evidence supports a critical role for microbiota in human health and disease; 16S and whole genome sequence (WGS) analyses are critical to understanding this relationship, but analysis of large and complex datasets requires advanced computing infrastructure and sophisticated software that are inaccessible to many researchers. Nephele bridges this gap as an all-in-one portal to essential microbiome data and tools. The user-friendly, web-based interface directly links commonly-used metagenomics applications (QIIME, mothur, BioBakery) to the Amazon Web Services (AWS) cloud. Preconfigured (but also customizable) pipelines lower the knowledge barrier for users who may be less familiar with command-line applications. Researchers can also seamlessly integrate public datasets into their analyses, including Human Microbiome Project (HMP) data, to compare against their own experimental sequence data. The on-demand, pay-per-use nature of cloud computing spares valuable funding and administrative resources for individual investigators and institutions by significantly reducing the cost of infrastructure procurement and maintenance. Open-source access to these tools and datasets encourages standardization of tools and methods, reproducibility of results, and extension of capabilities by the research community. By supporting greater adoption and improved efficiency in microbiome research, resources like Nephele can facilitate new discoveries that have the potential to transform medicine.
11:10 Tackling Life Sciences R&D Informatics Challenges
through Cross-Industry Pre-Competitive Collaboration Projects at The Pistoia
Alliance
Carmen Nitsche, Executive Director, Business Development North
America, Pistoia Alliance
Market pressures are driving the Life Sciences industry to
embrace pre-competitive collaboration in some aspects of their R&D
processes. We will examine several areas that lend themselves to such efforts
and review ongoing projects that address common challenges.
11:40 Selected Poster Presentation: Projections Meta Filesystem - Novel Approach for Distributed Data Access and Annotation
Anton Bragin, Ph.D., Systems Architect, Bioinformatics Institute
Nowadays bioinformatics data may exist in different forms such as text and binary files, SQL and NoSQL database records, data objects behind common or vendor-specific application programming interfaces (APIs). To make one data source talk to another or enable data consumption by some software tool the researcher should translate data requests by directly converting data (e.g., by dumping database records to flat files) or by implementing some data integration logic via scripting which is slow, error-prone and often requires extra local storage. To conquer the problems described we developed Projections meta filesystem aimed to provide uniform file-based access to heterogeneous resources and decouple logical resource representation from physical data storage. Projections system uses universal text format for description of logical data structure and set of resource-specific drivers that project actual data objects from some local or remote resource on local FUSE-mounted filesystem. That enables uniform view of data and provides transfer upon request capability. Important feature of Projections system is that metadata is first-class citizen enabling versatile metadata descriptions exceeding traditional tags and key-value properties and providing flexible search capabilities. Typical Projections usage scenarios include file access to non-file objects; using metadata for search and annotation; data analysis upon request: Projection provide logical representation of resource including its metadata that can be searched and analyzed, while data transfer is typically suspended until the data in actually needed; exchange of data resource representations and selected data objects by the mean of prototype files, which are small text files that can be easily edited and transferred. Projections is open-source software based on Filesystem in Userspace (FUSE) and can be used on any modern Linux machine. Currently the system is equipped with drivers for making data projections from NCBI SRA, Genbank, Amazon S3, local filesystem, ThermoFisher Torrent Suite Illumina MiSeq/HiSeq Control Software and can be readily expanded. We hope that Projections meta filesystem will promote data consolidation and make data access, exchange and usage patterns more uniform, metadata-driven and reliable.
12:10 pm Session Break
12:20 Luncheon Presentation I: Innovation through Collaboration: Cultural and Technological Advancements Empowered in the Pediatric Research Arena
Adam Resnick, Director, Children's Brain Tumor Tissue Consortium Division, Neurosurgery Children's Hospital, Philadelphia
The Children's Hospital of Philadelphia has partnered with academic institutions, clinical trial consortia and industry partners to build a new pediatric biospecimen and informatics platform that defines an open-access data discovery ecosystem. These new open-source tools and workflows support “big-data” innovation and define an alternative, sustainable model for collaborative data-driven discovery, in which researchers “compete” to share, connect, and integrate data on behalf of patients.
12:50 Luncheon Presentation II (Sponsorship Opportunity
Available) or Lunch on Your Own
1:20 Dessert Refreshment Break in the Exhibit Hall with Poster
Viewing
1:55 Chairperson’s Remarks
Samantha A. Schrier Vergano, M.D., FAAP, FACMG, Division
Director, Medical Genetics and Metabolism, Children’s Hospital of the King’s
Daughters
2:00 What If Your Biology Holds the Key that Protects Others
from Disease? Changing the Discourse around Sharing Health Data
Jason Bobe, Associate Professor, Director, Sharing Lab, Icahn
Institute for Genomics and Multiscale Biology, Mount Sinai School of Medicine;
Executive Director, PersonalGenomes.org
The protection of personal health and medical data has been
recognized as an important goal for decades. The societal value of sharing data
is immense, but to date paid much less attention. Designing a biomedical
research enterprise that provides individuals access to their own data and
improved options for sharing is paramount for addressing critical social
concerns like better health, new therapies and disease prevention strategies.
2:30 Community-Driven Approaches to Support Variant
Interpretation
Steven Harrison, Ph.D., Variant Scientist, Laboratory for
Molecular Medicine, Partners HealthCare Personalized Medicine; Harvard Medical
School
Improving our knowledge of genomic variation requires a
massive effort in data sharing. Community-driven groups are working to
incorporate shared data into variant assessment processes by guiding gene and
disease specifications to the ACMG Interpreting Sequence Variant Guidelines,
developing variant curation applications, aggregating shared data to inform the
community of discrepancies and concordance in variant interpretations, and
developing resources to facilitate data sharing.
3:00 Military Health Care Dilemmas and Genetic Discrimination:
A Cautionary Tale of One Family’s Experience with Whole-Exome Sequencing
Samantha A. Schrier Vergano, M.D., FAAP, FACMG, Division
Director, Medical Genetics and Metabolism, Children’s Hospital of the King’s
Daughters
Whole-exome sequencing (WES) has increased our ability to
analyze large parts of the human genome, bringing with it complicated ethical
considerations. Secondary findings, results that convey genetic risk in
asymptomatic individuals outside the initial indication for testing, can have
significant social or legal implications. We discuss these issues in the
experience with a family with careers in the U.S military, potentially
jeopardizing their employment and privacy.
3:30 Development and Validation of an SNP Panel for Sample
Identity Quality Control for Use in a High-Throughput Clinical Genetics
Laboratory
Thomas B. Freeman, Senior Data Scientist, Genetics and Genomic
Sciences, Icahn School of Medicine at Mount Sinai
In clinical genetic testing, it is absolutely imperative that
each patient receives the proper test results. We describe the development,
implementation and validation of a sample identity SNP panel run in parallel
with the DNA-Seq pipeline for sample identity verification. This workflow is
integrated with LIMS and data analysis pipeline to provided automated sample
identity quality control.
4:00 Conference Adjourns