Predicting US Presidential Election Outcome with Linear Mixed Model
Introduction
Predicting the outcome of the U.S. presidential election has been a topic of interest for political analysts, economists, and data scientists. The election results not only determine national policies but also have global economic and geopolitical implications. Traditional election forecasting models rely on polling data and fundamental economic indicators, but these approaches often lack robustness due to biases in polling methods and limited sample sizes.
In this project, we employ a Linear Mixed Model (LMM) to predict the outcome of the 2024 U.S. Presidential Election. The LMM approach allows us to incorporate both fixed effects (e.g., economic indicators and demographic data) and random effects (e.g., regional and election-year-specific variations) to make more accurate predictions.
Understanding the Election System
The U.S. presidential election is decided by the Electoral College, consisting of 538 electoral votes allocated across 50 states plus Washington D.C.. A candidate needs at least 270 electoral votes to win the presidency.
Each state awards its electoral votes to the candidate who wins the popular vote in that state, with exceptions in Maine and Nebraska, which use proportional distribution.
Given these rules, our goal is to predict the Democratic Party's vote share in each state and determine whether they can secure enough votes to win the presidency.
Why Use a Linear Mixed Model?
Unlike simple linear regression models, which assume that all variables have a uniform impact across all states and election years, Linear Mixed Models allow us to account for:
- Fixed Effects: National economic indicators (GDP growth, unemployment rate), demographic factors (white population proportion), and political alignment (Partisan Voting Index, PVI).
- Random Effects: Variations at the regional and election year levels, capturing unpredictable shifts in voter behavior.
The LMM is particularly useful for election forecasting because:
- It captures hierarchical structures in the data (e.g., states nested within regions).
- It accommodates variations over multiple election years.
- It provides more robust and generalizable predictions compared to simple regression models.
Data Collection & Processing
The dataset used in this project integrates multiple data sources:
Data Source | Description |
---|---|
MIT Election Lab | Historical election results (1980–2020) |
Bureau of Labor Statistics | State-level unemployment rates |
Bureau of Economic Analysis | National GDP growth |
IPUMS CPS Survey Data | Demographic distributions |
Cook Political Report | Partisan Voting Index (PVI) |
Key Variables:
- Dependent Variable: Democratic vote share in each state.
- Independent Variables:
- PVI (Partisan Voting Index) – Measures how a state leans politically compared to the national average.
- White Population Proportion – A demographic factor influencing voting trends.
- State Unemployment Rate – An economic indicator affecting public perception of governance.
- National GDP Growth – A broader economic indicator used in fundamental election models.
Random Effects:
- Region (South, Northeast, West, Midwest) – States within the same region tend to have correlated voting behaviors.
- Election Year (1980-2024) – Accounts for year-specific fluctuations such as economic crises or major political events.
Model Implementation in R
We implemented the Linear Mixed Model using the lmer
package in R.
library(lme4)
model <- lmer(Vote_Share ~ PVI + Unemployment_Rate + White_Population + (1 | Region) + (1 | Election_Year), data = election_data)
summary(model)
The random effects help us account for differences across regions and election years, improving the predictive power of the model.
Forecasting the 2024 Election
To predict the 2024 election outcome, we:
- Used Monte Carlo simulations to generate multiple possible scenarios.
- Calculated probability distributions of Democratic vote share for each state.
- Determined electoral vote allocations based on simulated outcomes.
Results
- The model predicts a 54.1% probability that the Democratic Party will win the 2024 election.
- The 95% confidence interval for the Democratic vote share is (51.01% – 57.19%).
- Key swing states (e.g., Pennsylvania, Georgia, Arizona) show a high degree of uncertainty, meaning they could be decisive in the final outcome.
Model Performance & Comparison
We compared multiple models to select the best-performing one:
Model | Fixed Effects | Random Effects | AIC/BIC | Best Fit? |
---|---|---|---|---|
Model 1 | PVI, GDP, Unemployment, White Population | Region | High AIC/BIC | ❌ |
Model 2 | PVI, GDP, Unemployment, White Population | Election Year | Medium AIC/BIC | ❌ |
Model 3 | PVI, GDP, Unemployment, White Population | Region + Election Year | Low AIC/BIC | ✅ |
Model 4 | PVI, Unemployment, White Population | Region + Election Year + Election Year Slope | Lowest AIC/BIC | ✅✅ |
Key Findings:
- Model 4 performs best, as it accounts for both regional and election-year effects while also incorporating variable PVI influence across election years.
- National GDP growth was found not to be statistically significant in predicting vote share, leading to its removal from the final model.
Limitations & Future Research
While our model provides a robust and data-driven prediction, there are limitations:
- Political factors not included: Candidate approval ratings, campaign strategies, and social issues are not directly accounted for.
- Limited demographic granularity: Education levels and gender distributions are not explicitly modeled.
- Potential for external shocks: Sudden economic downturns or scandals can influence voter behavior in unpredictable ways.
Future research can enhance the model by:
- Integrating sentiment analysis from news and social media.
- Applying Bayesian hierarchical models for better uncertainty quantification.
- Exploring mixed-effect logistic regression to directly model the probability of state wins.
Conclusion
The Linear Mixed Model provides a powerful approach to election forecasting by incorporating both fixed and random effects. Our model suggests a narrow Democratic victory in 2024, with a 54.1% probability of success.
As election dynamics continue to evolve, data-driven models will play an increasingly important role in understanding voter behavior and making informed political forecasts.