Transparency and Reliability in the Data Supply Chain

P. Groth
2013 IEEE Internet Computing  
Linked data is enabling the creation of large-scale distributed data supply chains. However, it is often unclear the origins of the information these supply chains produce. Unlike coffee, we don't have a fair trade certificate for data. Here, I outline how standards and research in data provenance are leading towards of "fair trade" data. On 26 November 2012, several major technology blogs, including Techcrunch (http://techcrunch.com) and Gizmodo (http://gizmodo.com), reported that Google was
more » ... king over ICOA, a public Wi-Fi hotspot provider, for US$400 million, causing ICOA's stock price to surge 1 . However, these reports were based on a false press release that somehow made its way to PRWeb, a press release distributor. Later, the mistake was caught and the blogs updated their posts. This example illustrates how errors within the supply chain of information can easily propagate and have a dramatic impact in the real world. What these blogs failed to do was adequately check where the information had come from in the first place. Indeed, we often judge quality by relying on knowing something's origins and how it was produced, termed provenance. For example, we can tell coffee's quality by knowing where it came from, the roasting process it underwent, and how it was brewed. Knowing provenance also lets us give credit to the actors in a system: whether it's the barista, the roaster, or the bean farmer. This same approach applies to information. On the Web, we publish lightweight forms of provenance intended for humans all the time. On Twitter, for example, marking provenance arose organically. Authors started using shorthand such as "RT" to denote a retweet or "MT" to denote that a tweet was modified, as well as the "@" symbol to refer to the original author (see Figure 1 ). On blogs, using blockquotes is common to demarcate that certain information comes from another source, and we use hyperlinks to attribute that information.
doi:10.1109/mic.2013.41 fatcat:m7gbapf7djdq7pfy6tb3o324eu