Data Sources

Source 1: Abortion Clinic Distances

Since the overturn of Roe v Wade we believed it would be useful to identify how distances to the nearest abortion provider within each state changed between the overturn in June to July 2022. This data was provided by the Myers Abortion Facility Database and is public information published on the Open Science Framework website. The database uses public data to as an estimate of travel time and distance for an individual to reach an abortion resource, whether this is a private physician, hospital or a clinic. The distances are calculated at a county level for each month. This information was collected by Caitlin Myers and has information from 2009 to October 1st, 2022.

The dataset from Myers Abortion Facility Database was downloaded and read in. The dataset was filtered to only include distances from June 2022 and July 2022 to reflect the immediate change in average distance to abortion provider immediately before and after the overturning of Roe v. Wade. Variables of interest, such as county, state, year, month, and distance to closest abortion clinic, were kept. After converting numerical months to abbrevations, the dataset was pivoted to a wide format to create a variable for each months’ distance. The dist_change variable was created taking the difference in average distance per county for the two months.

clinicdist =
  read_csv("datasets/2022.10.01_abortionaccess_countyxmonth.csv")

countydist_change = 
  subset(clinicdist, year == 2022 & (month == 6 | month == 7)) %>%
  select(origin_county_name, origin_state, year, month, distance_origintodest) %>%
  mutate(
    month = month.abb[month]
  ) %>%
  pivot_wider(
    names_from = "month", 
    values_from = "distance_origintodest") %>%
  mutate(
     dist_change = Jul - Jun
  )

After selecting variables for analysis and pivoting to wide format, the mean distance for each state was taken by averaging all counties by month and change in average distance. These means were then merged by state to create one finalized distance dataset.

statedist_change = 
  aggregate(dist_change ~ origin_state, countydist_change, mean)

statedist_Jun =
  aggregate(Jun ~ origin_state, countydist_change, mean)

statedist_Jul =
  aggregate(Jul ~ origin_state, countydist_change, mean)

statedist = 
  merge(statedist_Jun, statedist_Jul, by = c("origin_state")) %>%
  merge(statedist_change, by = c("origin_state"))

write.csv(statedist,"data_cleaned/statedist_final.csv", row.names = FALSE)

Source 2: Abortion Laws by State

We are also interested in the abortion laws and restrictions placed in each state. This data comes from Temple University’s Policy Surveillance Program. The data set looks at abortion regulations starting from December 1, 2018 through October 1, 2021. The bans2018 is cleaned and tidied to prepare for merging with other data sets. Note that states that had no abortion restrictions in 2018 were intentionally coded as . by the original coder. These were replaced with 0 to reflect their true status. We are also using data on abortion laws restriction status by state after Roe v. Wade was overturned. This data that include updates through November 2022 comes from the non-profit Lucy Burns Institute Ballotpedia. The data set was not available to download but we were able to scrape it from the website. A year and state_abv variables were created, and Washington DC was renamed to District of Columbia.

bans2018 <-
  read_excel("datasets/abortion_bans_data_Oct 2021.xlsx", sheet = "Statistical Data", range = "A1:H127") %>% 
  janitor::clean_names() %>% 
  rename(
    state = jurisdiction,
    abstatus = prohibit_req) %>% 
  mutate(
    state_abv = state.abb[match(state, state.name)],
    state_abv = ifelse(is.na(state_abv), "DC", state_abv),
    year = lubridate::year(effective_date),
    abstatus = replace(abstatus, abstatus == ".", "0")) %>% 
    select(state, state_abv, year, abstatus) %>% 
  filter(year == 2018)


bans_link <- "https://ballotpedia.org/Abortion_regulations_by_state"
bans_page <- read_html(bans_link)

bans2022 <- 
  bans_page %>%  html_nodes("table") %>% .[2] %>% 
  html_table() %>% .[[1]] %>% 
  janitor::clean_names() %>% 
  rename(
    state = state_abortion_restrictions_based_on_stage_of_pregnancy,
    abstatus = state_abortion_restrictions_based_on_stage_of_pregnancy_2,
    threshold = state_abortion_restrictions_based_on_stage_of_pregnancy_3) %>% 
  mutate(
  state = replace(state, state == "Washington, D.C.", "District of Columbia"),
  state_abv = state.abb[match(state, state.name)],
  state_abv = ifelse(is.na(state_abv), "DC", state_abv),
  year = 2022,
  abstatus = recode(abstatus, "Yes" = "1", "No" = "0")) %>% 
  select(- threshold) %>% 
  slice(-1) %>% 
  head(51) %>% 
  arrange(state)

We then merged both 2018 and 2022 data sets after harmonizing the variable names. The resulting data set was saved as a CSV file for later use in further data processing. The new data set include 4 variables, some of which are listed below:

state: State name
abstatus18: Abortion restriction status in 2018 ( 1 = restricted, 0 = not restricted)
abstatus22: Abortion restriction status in 2022 ( 1 = restricted, 0 = not restricted)

abortion_bans <-
  full_join(bans2018, bans2022)%>% 
  arrange(state, year) %>% 
  pivot_wider(
    names_from = year,
    values_from = abstatus) %>% 
  rename(abstatus18 = '2018', abstatus22 = '2022') 

write_csv(abortion_bans, "data_cleaned/abortion_bans_final.csv")

We also used the data on abortion laws by state from the World Population Review. This dataset offers an overview of state abortion laws including laws that has not yet gone into effect and may need courts to determine hierarchy. Abortion laws status was coded as either prohibited, Legal or Functional ban depending on the state laws in place. The We plan to use this dataset to create a map.

state_bans <-
  read_csv("datasets/state_abortion_laws2022.csv")

Source 3: Voter Turnout Data

The voter turnout data for the 2018 and 2022 elections was pulled from US Elections Project. This source was created by a Professor in the Department of Political Science at the University of Florida. The 2022 election was the most recent election and the 2018 election was chosen as the comparison election because it was also a midterm election. The 2020 election was a Presidential election so we didn’t want to use that as a comparison for voter turnout because Presidential elections typically have higher turnout than midterm elections.

library(tidyverse)
library(tidyr)
library(readxl)

voting2022_df =
  read_xlsx("datasets/2022 November General Election.xlsx", sheet = "Turnout Rates", range = "A2:N54") %>%
  janitor::clean_names() %>% 
  rename(state = x1, turnout_estimate2022 = preliminary_total_turnout_estimate, turnoutRate2022 = preliminary_turnout_rate, voting_eligible_pop2022 = voting_eligible_population_vep, voting_age_pop2022 = voting_age_population_vap) %>% 
  select(state, turnout_estimate2022, turnoutRate2022, voting_eligible_pop2022, voting_age_pop2022, state_abv)

voting2018_df = 
    read_xlsx("datasets/2018 November General Election.xlsx", sheet = "Turnout Rates", range = "A2:P54") %>%
  janitor::clean_names() %>%  
  rename(state = x1, turnout_estimate2018 = estimated_or_actual_2018_total_ballots_counted, turnoutRate2018 = estimated_or_actual_2018_total_ballots_counted_vep_turnout_rate, voting_eligible_pop2018 = voting_eligible_population_vep, voting_age_pop2018 = voting_age_population_vap) %>% 
  select(state, turnout_estimate2018, turnoutRate2018, voting_eligible_pop2018, voting_age_pop2018, state_abv)

The 2018 and 2022 datasets included TurnoutRate, which is the percent voter turnout in each state based on the number of people that are eligible to vote within that state. The other relevant variables that were kept were the turnout_estimate (the number of people who voted), voting_eligible_pop (the number of people who are eligible to vote), and voting_age_pop (the number of people old enough to vote). The two datasets were merged and difference variables were created to show the difference in each of the variables between the two years.

voting_df = 
  full_join(voting2022_df, voting2018_df, by = "state_abv") %>% 
  select(-state.x) %>% 
  mutate(turnout_estimate_difference = turnout_estimate2022 - turnout_estimate2018, turnout_Rate_difference = turnoutRate2022 - turnoutRate2018, voting_eligible_difference = voting_eligible_pop2022 - voting_eligible_pop2018, voting_age_difference = voting_age_pop2022 - voting_age_pop2018) %>% 
  rename(state = state.y)

voting_df <- voting_df[, c(6, 5, 1, 7, 11, 2, 8, 12, 3, 9, 13, 4, 10, 14)]

write.csv(voting_df,"data_cleaned/Voting Turnout - Final.csv", row.names = FALSE)