Bias and beyond in digital trace data
Large-scale digital trace data from sources such as social media platforms, emails, purchase records, browsingbehavior, and sensors in mobile phones are increasingly used for business decision-making, scientificresearch, and even public policy. However, these data do not give an unbiased picture of underlying phenomena.In this thesis, I demonstrate some of the ways in which large-scale digital trace data, despite itsrichness, has biases in who is represented, what sorts of actions are
... d, and what sorts of behaviorsare captured. I present three critiques, demonstrating respectively that geotagged tweets exhibit heavy geographicand demographic biases, that social media platforms' attempts to guide user behavior are successfuland have implications for the behavior we think we observe, and that sensors built into mobile phones likeBluetooth and WiFi measure proximity and co-location but not necessarily interaction as has been claimed.In response to these biases, I suggest shifting the scope of research done with digital trace data away from attemptsat large-sample statistical generalizability and towards studies that situate knowledge in the contextsin which the data were collected. Specifically, I present two studies demonstrating alternatives to complementeach of the critiques. In the first, I work with public health researchers to use Twitter as a means ofpublic outreach and intervention. In the second, I design a study using mobile phone sensors in which Iuse sensor data and survey data to respectively measure proximity and sociometric choice, and model therelationship between the two.