For quite a while, I have been thinking and writing about the challenges in using digital trace data sensibly in social science research (see for example my book on the use of Twitter data in political communication research, my article on how to interpret Twitter data as a mediated reflection of reality, and this piece on the fallacy of electoral predictions based on Twitter data). While there is no question that Computational Social Science, or Big Data for the more publicity minded, has developed into a vibrant and fascinating subfield, there probably is a spirited debate to be had about how much social science currently can be found in research grouped under the label.
While social and political phenomena are ostensibly in the focus of Computational Social Science, a closer look at prominent studies quickly shows that authors usually spend very little effort in connecting their work meaningfully with respective concepts, theoretical debates, and empirical findings from the social sciences. Instead, more often than not, authors anchor their papers superficially with a seemingly dominant paper–one cannot help but think that Google Scholar rankings play a role in defining what is seen as dominant–without adequately reflecting on or engaging with underlying concepts, interpretations, or controversies.
While work with digital trace data, thus, often is highly sophisticated with regard to data collection approaches and the use of advanced quantitative and computational methods, it often appears underdeveloped with regard to its connection to the social sciences. Correspondingly and unsurprisingly, the field has had a much stronger uptake among computer scientists than social scientists.
This state of affairs leads to an unfortunate neglect of the very real potential of digital trace data and computational methods in the social science mainstream and a, just as unfortunate, simplistic approach to the analysis of social and political phenomena among computer scientists. With great pleasure, I thus accepted the invitation by Talia Stroud and Shannon McGregor to elaborate on these observations in a chapter for an upcoming edited volume, Digital Discussions: How Big Data Informs Political Communication (forthcoming with Routledge). If you are interested in this topic, have a look at the preprint of resulting piece, Normalizing Digital Trace Data. To help you decide if it’s worth your time, here is the abstract:
Abstract: Over the last ten years, social scientists have found themselves confronting a massive increase in available data sources. In the debates on how to use these new data, the research potential of “digital trace data” has featured prominently. While various commentators expect digital trace data to create a “measurement revolution”, empirical work has fallen somewhat short of these grand expectations. In fact, empirical research based on digital trace data is largely limited by the prevalence of two central fallacies: First, the n=all fallacy; second, the mirror fallacy. As I will argue, these fallacies can be addressed by developing a measurement theory for the use of digital trace data. For this, researchers will have to test the consequences of variations in research designs, account for sample problems arising from digital trace data, and explicitly link signals identified in digital trace data to sophisticated conceptualizations of social phenomena. Below, I will outline the two fallacies in greater detail. Then, I will discuss their consequences with regard to three general areas in the work with digital trace data in the social sciences: digital ethnography, proxies, and hybrids. In these sections, I will present selected prominent studies predominantly from political communication research. I will close by a short assessment of the road ahead and how these fallacies might be constructively addressed by the systematic development of a measurement theory for the work with digital trace data in the social sciences.
Andreas Jungherr (2017). Normalizing Digital Trace Data. In Digital Discussions: How Big Data Informs Political Communication, eds. Natalie Jomini Stroud and Shannon McGregor. New York, NY: Routledge. (Forthcoming). [Preprint]