This semester, I teach a course on the ins and outs of the collection and analysis of digital trace data Computational Social Science, Digital Methods and Big Data in Political Science. The course is somewhat of an experiment as it attempts to teach some fundamentals of computational social science at a political science department to students without prior training in programming or exploratory data analysis.
For ease of access, I decided to focus the course on the collection and analysis of Twitter-data. As a backbone of the course, I decided to follow the structure of Twitter Data Analytics [Preprint] by Shamanth Kumar, Fred Morstatter, and Huan Liu. The course will start by teaching students to collect data through Twitter’s API. The next session will focus on saving and indexing data collected on Twitter by using MongoDB. In a next step, we will focus on how to transform Twitter-data saved in MongoDB so as to serve as basis for subsequent analyses. In a final step, we will discuss several approaches to analyse Twitter-data with R through network-analysis and time series methods. The students will finish the seminar by completing a research project of their choice, consisting in the collection, preparation, and analysis of Twitter-data.
In contrast to Kumar, Morstatter, and Liu, I decided to not rely on Java and JavaScript but on Python for data collection and preparation. As the course cannot provide a general introduction to the use of Python, I encouraged the students to use the following resources, depending on their familiarity with Python and other programming languages:
Mark Lutz. (2013). Learning Python. 5th Edition. O’Reilly.
Wes McKinney. (2012). Python for Data Analysis. O’Reilly.
Zed A. Shaw. (2014). Learn Python the Hard Way. 3rd Edition.
Code examples are available in a git repository for the course.
Detailed information for the session are available here:
Session 2
Session 3
Session 4
Session 5
Session 6
Session 7
Session 8
Session 9
Session 10
Session 11
Session 12
Session 13