Lab Seminar - Biases of using aggregate data to infer individual voting preferences


Corey Lang, Associate Professor of Department of Environmental and Natural Resources Economics, University of Rhode Island, presented his research on voting for conservation. Voting is one of the main determinants of public good provision (Fischel 2001) and an important source of data for examining revealed preferences for environmental goods. Dating back to at least Deacon and Shapiro (1975), economists have often used aggregated voting data matched with aggregate census socioeconomic data to estimate demand for various environmental goods (e.g., Kahn and Matsusaka 1997, Kotchen and Powers 2006, Banzhaf et al. 2010, Holian and Kahn 2015, Burkhardt and Chan 2017).

The purpose of this paper is to assess the extent and causes of bias when using aggregate data to understand individual determinants of voting decisions.  Corey and his coauthor proceed on two fronts to understand the bias of aggregate voting data. First, they develop a Monte Carlo simulation analysis that matches the salient structure of individual voting decisions and aggregated precinct records and compare regression results using individual and aggregate data. Second, they collect two datasets related to voting on a statewide environmental referendum in Rhode Island called the Green Economy Bonds (GEB).

From basic summary statistics, they find support for conditions leading to the two sources of bias.  The paper estimates identical models of voting preferences for GEB using the aggregate data and the individual exit poll data, regressing GEB vote on presidential vote and a large set of socioeconomic characteristics (age, income, race, etc.). The paper finds strong differences between the two models. In additional specifications, Corey and the coauthor are able to demonstrate the presence of spatial omitted variables and the biasing effect of a mismeasured voting population. However, no adjustment can mitigate the bias of the aggregate model. The conclusion of this paper is that aggregate data cannot infer unbiased individual voting preferences, and should be used only with this caveat.