Big Data—Conceptual Modeling to the Rescue [chapter]

David W. Embley, Stephen W. Liddle
2013 Lecture Notes in Computer Science  
Every day humans generate several petabytes of data [ZEd + 11] from a variety of sources such as orbital weather satellites, ground-based sensor networks, mobile computing devices, digital cameras, and retail point-of-sale registers. Companies, governments, and individuals store this data in a wide variety of structured, semistructured, and unstructured formats. However, most of this data either languishes in underutilized storage repositories or is never stored in the first place. Ironically,
more » ... n an era of unprecedented access to a veritable gold mine of information, it is increasingly difficult to unlock the value stored within our data. The essential problem of "Big Data" is that we are accumulating data faster than we can process it, and this trend is accelerating. The so-called "four V's" characterize Big Data: -Volume: applications sometimes exceeding petabytes 3 -Variety: widely varying heterogeneous information sources and hugely diverse application needs -Velocity: phenomenal rate of data acquisition, real-time streaming data, and variable time-value of data -Veracity: trustworthiness and uncertainty, beyond the limits of humans to check We should expect conceptual modeling to provide some answers since its historical perspective has always been about structuring information-making its volume searchable, harnessing its variety uniformly, mitigating its velocity with automation, and checking its veracity with application constraints. We do not envision any silver bullets that will slay the "werewolf" of Big Data, but conceptual modeling can help, as we illustrate with an example from our project that seeks to superimpose a web of knowledge over a rapidly growing heterogeneous collection of historical documents whose storage requirements are likely to eventually exceed many exabytes. 3 Having successfully communicated the terms "mega-," "giga-," and "tera-byte," in the Big Data era we now need to teach users about "peta-," "exa-," "zetta-," and even "yotta-bytes." The NSA data center being built in Utah within 35km of our university purportedly is designed to store at least zettabytes (10 21 bytes) and perhaps yottabytes (10 24 bytes) of data.
doi:10.1007/978-3-642-41924-9_1 fatcat:iy6yk743szgaxegyy46hscppdi