Researchers increasingly test data documenting user’s behaviour on social media channels for their predictive power of social phenomena. In the upcoming edition of Internet Research Pascal Jürgens and I have and article that discusses general questions regarding the analysis of social phenomena with social media and that offers a specific approach to analyse social media data streams to detect anomalies in user behaviour that might offer clues about relevant offline events.
I can only link to a gated version of the article. Still, if you do not have access to the publication but want to read the article, please drop me an e-mail.
Andreas Jungherr and Pascal Jürgens. 2013. “Forecasting the pulse: how deviations from regular patterns in online data can identify offline phenomena.” Internet Research 23(5).
Abstract:
Purpose – The steady increase of data on human behavior collected online holds significant research potential for social scientists. We add to this research by a systematic discussion of different online services, their data generating processes, the offline phenomena connected to these data, and by demonstrating, in a proof of concept, a new approach for the detection of extraordinary offline phenomena by the analysis of online data.
Design/methodology/approach – To detect traces of extraordinary offline phenomena in online data, we determine the normal state of the respective communication environment by measuring the regular dynamics of specific variables in data documenting user behavior online. In our proof of concept, we do so by concentrating on the diversity of hashtags used on Twitter during a given time span. We then use the seasonal trend decomposition procedure based on loess (STL) to determine large deviations between the state of the system as forecasted by our model and the empirical data. We take these deviations as indicators for extraordinary events, which led users to deviate from their regular usage patterns.
Findings – We show in our proof of concept that this method is able to detect deviations in the data and that these deviations are clearly linked to changes in user behavior triggered by offline events.
Originality/value – This paper adds to the literature on the link between online data and offline phenomena. It proposes a new theoretical approach to the empirical analysis of online data as indicators of offline phenomena. The paper will be of interest to social scientists and computer scientists working in the field.