This semester, I will be taking my course on “Using Digital Trace Data in the Social Sciences” for another spin at the University of Konstanz. The course builds on a tutorial on the work with Twitter data that Pascal Jürgens and I published in 2016. So if you are participating or following along remotely, make sure to have a look at the tutorial.
Over the course, you will learn fundamental techniques of data collection preparation, and analysis with digital trace data in the social sciences. In this, we will focus on working with the microblogging-service Twitter. Over the course, students are expected to become proficient in the use of two programming languages, Python and R.
The course is designed for students without prior training in programming or exmploratory data analysis. Still, by the end of course students are expected to independently perform theory-driven data collections on the microblogging-service Twitter and use these data in the context of a series of specified prototypical analyses. So make sure to take the time over the course of the semester to get acquainted with Python, R, and potentially SQL.
We will start the course by focusing on conceptual issues associated with the work with digital trace data. You will then learn to use fundamental practices in the use of the programming language Python. Following this, we will collect data from Twitter’s APIs through a set of example scripts written in Python. After downloading data from Twitter through Python, we will load these data into a SQLite database for ease of access and flexibility in data processing tasks. Finally, we will discuss a series of typical analytical procedures with Twitter-data. Here, we will focus on counting entities and establishing their relative prominence, time series analysis, and basic approaches to network analysis. For these analyses, we will predominantly rely on R.
The course itself follows closely the outline provided in A Tutorial for Using Twitter Data in the Social Sciences: Data Collection, Preparation, and Analysis. You will be introduced to the work with Twitter-data through a set of example scripts developed in the context of the tutorial and available on GitHub.
Here is the course’s syllabus.
You can find detailed information on the content of the sessions, background readings, slides, and example code at the dedicated posts per session:
Session 1: Introduction and Conceptual Issues in the Use of Digital Trace Data in Social Science, Computational Social Science, Digital Methods, and Big Data [Slides]
Session 2: Set Up and Introduction to Collecting Data on Twitter [Slides/Code]
Session 3: Introduction to Python [Code]
Session 4: Collecting Data Through Twitter’s API [Code]
Session 5: How to Find A Research Question?
Session 6: Loading Twitter Data Into a Database [Code]
Session 7: Sample Analyses: Counts and Time Series [Code]
Session 8: Sample Analyses: Networks [Code]
Session 9: Data Lab
Session 10: Presentation and Discussion of Students’ Research Projects Pt. 1
Session 11: Presentation and Discussion of Students’ Research Projects Pt. 2
Session 12: Presentation and Discussion of Students’ Research Projects Pt. 3
Session 13: Where to take it from here? Discussion of Open Questions and Paper
In addition to the tutorial a series of texts are encouraged background readings:
Using Digital Trace Data in the Social Sciences:
- David Donoho. 50 Years of Data Science. Paper presented at the Tukey Centennial workshop, Princeton, NJ. Sept. 18 (2015).
- Bradley Efron, and Trevor Hastie. Computer Age Statistical Inference: Algorithms, Evidence and Data Science. Cambridge: Cambridge University Press. (2016).
- James Howison, Andrea Wiggins, and Kevin Crowston. “Validity issues in the use of social network analysis with digital trace data“. In: Journal of the Association for Information Systems 12.12 (2011), pp. 767–797.
- Andreas Jungherr. Analyzing Political Communication with Digital Trace Data: The Role of Twitter Messages in Social Science Research. Cham, CH: Springer, 2015.
- Andreas Jungherr (2017). Normalizing Digital Trace Data. In Digital Discussions: How Big Data Informs Political Communication, eds. Natalie Jomini Stroud and Shannon McGregor. New York, NY: Routledge. (Forthcoming). [Preprint]
- Andreas Jungherr, Harald Schoen, and Pascal Jürgens. “The mediation of politics through Twitter: An analysis of messages posted during the campaign for the German federal election 2013“. In: Journal of Computer-Mediated Communication 21.1 (2016), pp. 50–68. doi: 10.1111/jcc4.12143.
- Andreas Jungherr, Harald Schoen, Oliver Posegga, and Pascal Jürgens. “Digital Trace Data in the Study of Public Opinion: An Indicator of Attention Toward Politics Rather Than Political Support“. In: Social Science Computer Review 35.3 (2017), pp. 336-356. doi: 10.1177/0894439316631043
- Matthew Salganik. Bit by Bit: Social Research in the Digital Age. (2017).
- Naomi Ceder (2018) The Quick Python Book. 3rd ed. Manning Publications.
- Nick Eubank (2015) Data Analysis in Python.
- Wes McKinney (2017) Python for Data Analysis 2nd ed. O’Reilly Media, Inc.
- Winston Chang (2012) R Graphics Cookbook. O’Reilly Media, Inc.
- Richard Cotton (2013) Learning R. O’Reilly Media, Inc.
- Robert Kabacoff (2015) R in Action. 2nd ed. Manning Publications.
Data Collection Online:
- Matthew A. Russell (2013) Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More, 2nd Edition. O’Reilly Media, Inc.