Big Data in the Era of Health Information Exchanges: Challenges and Opportunities for Public Health
Public health surveillance of communicable diseases depends on timely, complete, accurate, and useful data that are collected across a number of healthcare and public health systems. Health Information Exchanges (HIEs) which support electronic sharing of data and information between health care organizations are recognized as a source of 'big data' in healthcare and have the potential to provide public health with a single stream of data collated across disparate systems and sources. However,
... sources. However, given these data are not collected specifically to meet public health objectives, it is unknown whether a public health agency's (PHA's) secondary use of the data is supportive of or presents additional barriers to meeting disease reporting and surveillance needs. To explore this issue, we conducted an assessment of big data that is available to a PHA-laboratory test results and clinician-generated notifiable condition report data-through its participation in a HIE. Despite these and other issues such as measurement error and confounding that are well-known challenges to both big and small data, strategies traditionally employed by public health epidemiologists and other public health professionals can uncover limitations and contribute to the design of solutions in collection, integration, warehousing, and analysis of big data so its value and utility to public health can be optimized. In recognition of the 10 year anniversary of the incorporation of the Internet search firm Google, the journal Nature issued a special supplement on 'big data' and what the availability of large data sets meant and will mean for scientists and researchers  . In particular, the supplement focused on the opportunities that will be possible when issues such as interoperable data infrastructures, security, data standardization, storage and transfer requirements, and data governance are resolved. Now, nearly 10 years later, users of big data-characterized by the 5 Vs (huge volume, high velocity, high variety, low veracity, and high value)-still encounter the issues presented in the Nature special supplement . In particular, the primary challenges to utilizing big data center around the diversity of data types (variety), the resources required to handle data collection, storage and processing (velocity), and uncertainties inherent in mixing and cleaning data from varied data streams that generates unpredictability in the data (veracity)  . Nevertheless, within the health care sector, despite these challenges, big data also promises great opportunities to improve quality of health care delivery, population management, early detection of disease, decision-making, and cost reduction  . Major contributors to the explosion of big data are investments in information technology (IT), such as increased adoption of electronic medical record systems  , and the creation of health information exchanges (HIEs)  which facilitate sharing of electronic data and information between health care organizations  . While the focus of HIEs has been on sharing patient information between clinics, hospitals, pharmacies, laboratories, and payers, public health agencies (PHAs) are increasingly included in HIEs  . PHA participation in a HIE provides a single stream of data collated across disparate systems and sources for public health. Public health is a data-intensive and -driven field. Data is a highly valued currency for assessing the health of the community; providing guidance to stakeholders for handling a foodborne illness outbreak; forecasting the burden of seasonal influenza to enable sufficient timing to vaccinate vulnerable populations; and innumerable other efforts that aim to prevent disease, prolong life, promote human health, and mitigate unnecessary suffering  . Within the context of big data, public health efforts include linking information technology systems to conduct population-based cancer research and surveillance , more effectively identify behaviors that can build healthier communities  , and improve targeted and timely epidemiologic surveillance of communicable and infectious disease  . Specific to public health surveillance of communicable diseases, effective surveillance relies on time-sensitive, complete, accurate, and useful data that are collected across a number of healthcare and public health systems. It could be assumed that PHA participation in a HIE would support and potentially improve surveillance efforts as data collected within the clinical encounter could be shared with public health more rapidly and be integrated into PHA decision support systems to meet public health practice needs. However, given that these data are not collected specifically to meet public health objectives, it is unknown whether a PHA's secondary use of the data is supportive of or presents additional barriers to meeting disease reporting and surveillance needs. To explore this issue, we conducted an assessment of big data that is available to a PHA-laboratory test results and clinician-generated notifiable condition report data-through its participation in a HIE and discuss the extent to which its value impacts the rationale for investing in the infrastructure, including workforce training, that is required to collect and interpret this data and ultimately inform measurable improvements in the health of public health community stakeholders.