Statistical Analysis

For regression analysis, the merged data set was read and imported in. A variable, ab_change was created to quantify the change in abortion restriction status per state (ab_change = 1 if Roe v. Wade added an abortion restriction that was not previously in place; ab_change = 0 if the overturning of Roe v. Wade did not change abortion restrictions). This variable was created by taking the difference between abortion restriction status in 2022 and 2018.

final_full = 
  read_csv("data_cleaned/finalprojectfinaldataset.csv") %>%
  mutate(
    ab_change = abstatus22 - abstatus18
  )

A regression model was created utilizing the following:

Outcome: The difference in percent turnout rate between 2018 and 2022 (turnout_rate_difference)
Predictor: The difference in abortion restriction status by state between 2018 and 2022 (ab_change)
Predictor: The average difference in closest abortion clinic by state between 2018 and 2022 (dist_change)

Regression Models

regress_turnoutdif_full = 
  lm(turnout_rate_difference ~ ab_change + dist_change, data = final_full) 

summary(regress_turnoutdif_full) %>% broom::tidy() %>% knitr::kable(digits = 2)

term	estimate	std.error	statistic	p.value
(Intercept)	-0.04	0.01	-6.15	0.00
ab_change	0.06	0.02	2.30	0.03
dist_change	0.00	0.00	0.43	0.67

confint(regress_turnoutdif_full) %>% knitr::kable(digits = 2)

	2.5 %	97.5 %
(Intercept)	-0.05	-0.03
ab_change	0.01	0.11
dist_change	0.00	0.00

Upon running the model, we find that the change in abortion status has a significant effect on voter turnout (P = 0.0261). On average, a state that had an abortion restriction put in place after the overturning of Roe v. Wade had a 5.661% greater voter turnout in the 2022 election compared to the 2018 election. We are 95% confident that the true increase in voter turnout falls between 0.703% and 10.619%.

The average change in distance to abortion clinic was not significant.

Running an F-test to determine whether the regression model with coefficients is better than an intercept-only model, we get a P-value of 0.08054. This indicates that this model is not significantly better than the intercept-only model.

regress_turnoutdif_onlyab = 
  lm(turnout_rate_difference ~ ab_change, data = final_full)

summary(regress_turnoutdif_full) %>% broom::tidy() %>%  knitr::kable(digits = 2)

term	estimate	std.error	statistic	p.value
(Intercept)	-0.04	0.01	-6.15	0.00
ab_change	0.06	0.02	2.30	0.03
dist_change	0.00	0.00	0.43	0.67

confint(regress_turnoutdif_full) %>% knitr::kable(digits = 2)

	2.5 %	97.5 %
(Intercept)	-0.05	-0.03
ab_change	0.01	0.11
dist_change	0.00	0.00

In response to the insignificant F-test, it was decided to remove dist_change because it was an insignificant predictor. This leaves a univariate linear regression model, with the change in abortion restriction and model being significant (P-value = 0.0267). On average, a state that had an abortion restriction put in place after the overturning of Roe v. Wade had a 5.559% greater voter turnout in the 2022 election compared to the 2018 election. We are 95% confident that the true increase in voter turnout falls between 0.669% and 10.449%.

Diagnostics

First, we examine the validity of our models for linearity assumption by plotting fitted values against residuals.

turnoutdif_full_plot <- 
  final_full %>% 
  modelr::add_residuals(regress_turnoutdif_full) %>% 
  ggplot(aes(x = turnout_rate_difference, y = resid)) +
  geom_point() +
  labs(
        title = "Multivariate Linear Model",
        x = "Voter Turn Out Difference",
        y = "Residuals") 

turnoutdif_onlyab_plot <- 
  final_full %>% 
  modelr::add_residuals(regress_turnoutdif_onlyab) %>% 
  ggplot(aes(x = turnout_rate_difference, y = resid)) +
  geom_point() +
  labs(
        title = "Univariate Linear Model",
        x = "Voter Turn Out Difference",
        y = "Residuals") 


(turnoutdif_full_plot + turnoutdif_onlyab_plot)

The plots show a strong linear trend with the exception of 4 outliers in the multivariate model and 3 in the univariate model. This confirms that linear regression is most appropriate for our analysis

Next we examine the validity of our models and assess its goodness of fit. Here. we performed this step on the multivariate model only since both models show a linear trend above.

check_model(regress_turnoutdif_full, check = c("linearity", "outliers", "qq", "normality"))

Linearity: As we can see in the top-left chart, the residuals have mean zero and are uncorrelated with the fitted values. Additionally, the best fit line of the residuals regressed on the fitted values has an intercept and slope of zero. This indicate that our model is properly specified.
Homoscedasticity: The top-left chart shows that the residuals are evenly dispersed around the reference line with the exception of one outlier. This indicates that the variance of our residuals is constant across all fitted values.
Preclusion of outliers: All of the points in the top-right chart fall within the dashed curves, thus we can conclude that this assumption is satisfied.
Normality: The bottom-left plot for normality shows a linear trend, with the exception of 5 outliers. This confirms our earlier finding of linearity. In addition, the distribution of the residuals as we can see in the top-right chart closely follows a normal distribution centered at zero.