Daniel Butler Receives NSF Grant for State Politics Project

15056

Daniel Butler Receives NSF Grant for State Politics Project


Professor of Political Science, and Director of Undergraduate Studies for Political Science, Daniel Butler, has received a a grant from the National Science Foundation (NSF) for a state politics project. The project, titled, "The State of the State: Archival, Unstructured Data and Machine Learning", looks to create a first of its kind digital database of US governor's State of the State speeches. According to Butler:

Our team has been awarded a grant from the National Science Foundation to build the first-ever comprehensive digital database of U.S. governors’ State of the State addresses. This project will open up new ways to study how state politics are becoming nationalized and create resources for researchers, educators, and the public. We’re thrilled to bring students into this work and to share our findings widely.

The project will be a collaboration with WashU graduate students and students and faculty from Emory University. The abstract from the NSF is available below:

This project uses machine learning to create a database of State of the State (SOTS) addresses from 1800 to 2016 and state-level agendas. The data collection involves collecting and cleaning the full set of speeches from governors over time. SOTS data are stored at publicly available data repositories and a website developed by the PIs. Methodologically, the project advances the study of unstructured data and the use of artificial intelligence and machine learning. The data support knowledge and scholarship related to public decision and provide a web resource for educators and journalists.

This project extends the SOTS dataset that covers state-of-the-state addresses from 1800 to 2016. The PIs collect, process, and analyze SOTS speeches from years prior to 1960, using techniques developed to overcome poor quality documents implemented through software created by one of the PIs. The software applies machine learning to isolate, enhance, and extract text from hard-to-read documents, correcting document layout problems with a novel statistical approach before it runs optical character recognition (OCR). This results in a significantly higher level of accuracy than other current approaches.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Initial funding for this research was provided through a Weidenbaum Center small grant award.