In this session we focus on querying the data contained in our database to extract information allowing us a series of typical analyses often performed with Twitter data. The three analytical approaches are counts, time series, and network analysis. In the tutorial we describe these analytical approaches in detail and list exemplary studies illustrating these approaches (see Jürgens & Jungherr, pp. 42-79).
We will query the database from Python using a series of predefined commands. As before, we use peewee to communicate with our SQLite database from Python. Make sure to examine the workings of these commands in detail. You find them listed in our script “database.py“.
After exporting the summary statistics ready for analysis, we load them into R to perform a series of typical analyses. You find introductory readings on using R, exploratory data analysis in R, plotting data in R, time series analysis, and network analysis in the background readings.
The example scripts provided for us are specified to work with an example dataset collected by us during on the Republican Primary debates in the autumn of 2015. You can download a replication dataset through Twitter’s “hydrate” function following the instructions in Jürgens & Jungherr, p. 42. Of course you can adapt our commands provided in the files “example.py” and “database.py” according to your interest. Still, presently they are optimized to working with our sample dataset.
Mandatory Readings:
- Pascal Jürgens and Andreas Jungherr (2016) A Tutorial for Using Twitter Data in the Social Sciences: Data Collection, Preparation, and Analysis. Social Science Research Network (SSRN). doi: 10.2139/ssrn.2710146, pp. 42-79.
Background Readings R:
- Winston Chang (2012) R Graphics Cookbook. O’Reilly Media, Inc.
- Garrett Grolemund and Hadley Wickham (2016) R for Data Science. O’Reilly Media, Inc.
- Robert Kabacoff (2015) R in Action. 2nd ed. Manning Publications.
- Hadley Wickham (2016) ggplot2: Elegant Graphics for Data Analysis. 2nd ed. Springer.
Background Readings: Time Series Analysis:
- Janet M. Box-Steffensmeier et al. (2014) Time Series Analysis for the Social Sciences. New York, NY: Cambridge University Press.
Background Readings: Network Analysis:
- David Easley and Jon Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World (2010) Cambridge, UK et al.: Cambridge University Press.
- Eric D. Kolaczyk and Gábor Csárdi (2014) Statistical Analysis of Network Data with R. Springer.
Course Material:
Back to Course Overview.