Developing a System for the Automated Coding of Protest Event Data
Date: Sept 25, 2014
Time: 12:30pm – 1:30pm
Abstract: Scholars and policy makers recognize the need for better and timelier data about contentious collective action, both the peaceful protests that are understood as part of democracy and the violent events that are threats to it. News media provide the only consistent source of information available outside government intelligence agencies and are thus the focus of all scholarly efforts to improve collective action data. Human coding of news sources is time-consuming and thus can never be timely and is necessarily limited to a small number of sources, a small time interval, or a limited set of protest “issues” as captured by particular keywords. There have been a number of attempts to address this need through machine coding of electronic versions of news media, but approaches so far remain less than optimal. The goal of this paper is to outline the steps needed build, test and validate an open-source system for coding protest events from any electronically available news source using advances from natural language processing and machine learning. Such a system should have the effect of increasing the speed and reducing the labor costs associated with identifying and coding collective actions in news sources, thus increasing the timeliness of protest data and reducing biases due to excessive reliance on too few news sources. The system will also be open, available for replication, and extendable by future social movement researchers, and social and computational scientists.
Bio: Alex Hanna is a PhD candidate in sociology at the University of Wisconsin-Madison. Substantively, he is interested in social movements, media, and the Middle East. Methodologically, he is interested in computational social science, textual analysis, and social network analysis. Alex’s dissertation uses principles from machine learning and information extraction to build an automated system for the creation of new protest event data. His master’s thesis looks into the genesis of the the 6th of April Youth Movement with computer-aided content analysis methods.