stock photoFunded by the NIH Big Data to Knowledge (BD2K) Program, we propose to create a massively scalable toolkit to enable large, multi-center Patient-centered Information Commons (PIC) at local, regional, and national scale, where the focus is the alignment of all available biomedical data per individual. Such a Commons is a prerequisite for conducting the large-N, Big Data, longitudinal studies essential for understanding causation in the Precision Medicine framework while simultaneously addressing key complexities of the Patient-Centric Outcome Research studies required under ACA (Affordable Care Act). Our proposal is solidly grounded in our experience over the last 25 years of harnessing clinical care data to the research enterprise.

In creating the PIC, we propose to focus on:

  1. Enabling the identification and retrieval of all data that pertain to individual health by creating a data sharing architecture that is capacious enough for all relevant data types and that enables patient and institutional autonomy to be respected.
  2. Testing fully-scaled implementations of the proposed architecture early in the development process, with the active involvement of a committed user community, which will allow us to refine our designs to facilitate subsequent robust dissemination and adoption.
  3. Providing commodity workflows that can be used to “clean” and complete the often noisy and sparse data gathered in the course of observational studies.
  4. Embracing decentralization while enabling the construction of a nationally or regionally-scaled patient-centered information commons.
  5. Encouraging the selection of standards through the tools that enable the construction of patient-centered information commons.
  6. Employing diagnostic classification and prognostication as figures of merit to measure how well a patient-centered information commons adds to the understanding of patient populations.

In addition to the research and development agenda, we have also taken on the development of educational opportunities for the end-user community to become more familiar with the methods and challenges of data science.

PUBLIC HEALTH RELEVANCE: Large populations of individuals characterized by many different and complementary types of data—for example, genetic, environmental, imaging, behavioral, and clinical findings—will allow significant progress in our ability to accurately classify individuals as to their disease or disease risk and provide more precise predictions of their disease course. The proposed toolkit enables such characterization at the local and national scale.

As part of our membership in the BD2K Consortium, we have engaged in several collaborations which collectively focus on enabling universal sharing of data sharing across disparate sources, including the other BD2K Centers.  Specifically, this supplemental work includes:

1.  Count Everything: Integrating Clinical, Genomics and mHealth APIs across the BD2K Program

In many specific domains progress is beng made on creating standard application programming interfaces (APIs) that allow different silos to provide a common view of their data so that research questions can be answered by the integration of data across multiple such silos.  However, without coordiation these individual standardization approaches will not be woven into a comprehensive representation of all biomedica ldata.  This collaborative across the Center for Integrating Data for Analysis, Anonymization and SHaring (iDash), the Biomedical and Healthcare Data Discovery Index Ecosystem (bioCADDIE), and three BD2K Centers of Excellence, the Center for Big Data in Translational Genomics (CBDTG), the Mobile-sensor Data-to-Knowledge (MD2K) Center and PIC-SURE, brings together a wide representation of resources and skillsets to address this problem.  Work is underway to (1) Estabish a covering set of interoperable APIs sufficient to answer simple quantitative questions integrating over clinical, omic and mobile sensor data; (2) Create a software tolkit that implements these APIs in one easily deployable package; and (3) Deploys the toolkit across a number of medical institutions to create a network of data providers serving a common view of a large swath of biomedical data.  Project Lead: Paul Avillach, MD, PhD

2.  Global Rare Diseases Registry (GRDR): https://grdr.hms.harvard.edu

Consistent with the overall aims of our PIC-SURE project, and in collaboration with the NIH National Center for Advancing Translational Sciences (NCATS), the PIC-SURE team is using our i2b2/tranSMART Amazon Web Services (AWS) Cloud Environment to create a unified, queriable Global Rare Diseases Registry.  This application provides a web broswer over an encrypted connection to enable authorized users to see and work with the data resident in the several individual data bases currently configured, but not readily accessible to the scientific community.  To date, data from 10 rare disease registries, totaling 5,303 individuals, have been integrated in the platform.  Project Lead: Paul Avillach, MD, PhD

3.  Integrating Data and Toolkits Across Institutional Boundaries

Together with the University of Pittsburgh BD2K Center for Causal Modelind and Discovery of Biomedical Knowledge from Big Data (CCD), the PIC-SURE Team is developing a system that will enable sharing of data and computational resources in a cloud environment (or other externally hosted system) by creating authenticated access to secure data and analysis and visualization modules across institutional boundaries, a prerequisite to productive collaborations among institutions that host large biomedical data repositories and computational infrastructures.  This proof of principle fedederated data ecosystem will be designed to serve as a model for the ultimate connectivity desired by researchers working with large and disparate data sets.  Project Lead: Paul Avillach, MD, PhD

4.  Sync for Science (S4S) http://syncfor.science/

Consistent with the overall goals of PIC-SURE to encapsulate all data sources relevant to individual health in a "patient-centered information commons instance," the opportunity to extend this effort to support the new Precision Medicine Initiative goal of centrally collecting and sharing all of the health information streams from a 1M cohort offered a well aligned proof of principle opportunity.  We will be working with the NIH and ONC PM Teams and the major US EHR vendors to implement a consistent, standards-based workflow, building on open specifications including Health Level 7's Fast Healthcare Interoperablity Resources (FHIR) and OAuth.  Once developed and implemented, this functionality will allow individuals to connect a research app to their electronic health data, faciliating individual data donation for research and leveraging patients' access rights under the Health Insurance Portability and Accountabilitiy Act (HIPAA). The pilots will also collect information on individual participant preferences on alternative approaches for data donation. Project Lead: Josh Mandel, MD