Originally published in Faster Cures' "bloggersation" regarding: What is the most important thing that could happen in 2012 to ensure better utilization of big data—housed in EMRs or other platforms—for drug development?
“Big data” is typically managed in large pooled data sets, combining data from many settings of care. While there are terrific applications of pooled data, including registries and successful use of large research databases, there are critical issues of policy and strategy. Pooled “Big data” in healthcare has its benefits but also has several drawbacks.
From a policy perspective, pooled data approaches are problematic. Large pools of PHI are targets for attack from bad actors. Also, many PHI-holders have their own consent agreements with their patients. It is difficult to manage these different consent agreements when pooling PHI in one place. Additionally, HIPAA requires covered entities to control the flow of PHI, either directly or through agreements. When data is pooled, the party pooling the data must have a business associate agreement or data use agreement (in the case of research databases) with each covered entity that contributes data to the pool, with the same (or similar terms). This can be impracticable for the third party or undesirable for covered entities, as they often have to agree to non-negotiable terms in the agreement in order to pool their data.
From a strategic standpoint, pooled data is inflexible, stale and inaccurate. Pooled data approaches aren’t generally sustainable: the benefits of pooled approaches are too indirect to support the operational costs and complexity. Furthermore, health care organizations are unwilling to lose control of their information not just for policy reasons, but also due to competitive considerations.
But the absence of a standards-based alternative has given rise to pooled data approaches with all of these substantial drawbacks.
2012 is the defining moment for new standards that will enable big data analytics in a distributed environment. An ONC sponsored open government initiative, Query Health, is defining the standards and specifications for distributed population queries. Researchers will be able to leverage these standards to be “send questions to the data”. Questions can be sent to data sources including EHRs, HIEs, PHRs, payers’ clinical record or any other clinical record. Aggregate responses leave patient level information secure behind the data source’s firewall. Aggregate responses support questions related to disease outbreak, quality, CER, post-market surveillance, performance, utilization, public health, prevention, resource optimization and many others.
The path for these new standards will dramatically cut cycle time for deployment of new questions from years to days – making possible support for a learning health system.
The focus of 2012 should be laying the foundation for success: defining the standards and services for distributed population health queries. This is one extremely impactful way to leverage the potential of big data for research. For more information, visit QueryHealth.org.