Since the overturn of Roe v Wade we believed it would be useful to identify how distances to the nearest abortion provider within each state changed between the overturn in June to July 2022. This data was provided by the Myers Abortion Facility Database and is public information published on the Open Science Framework website. The database uses public data to as an estimate of travel time and distance for an individual to reach an abortion resource, whether this is a private physician, hospital or a clinic. The distances are calculated at a county level for each month. This information was collected by Caitlin Myers and has information from 2009 to October 1st, 2022.
The dataset from Myers Abortion Facility Database was downloaded and
read in. The dataset was filtered to only include distances from June
2022 and July 2022 to reflect the immediate change in average distance
to abortion provider immediately before and after the overturning of Roe
v. Wade. Variables of interest, such as county, state, year, month, and
distance to closest abortion clinic, were kept. After converting
numerical months to abbrevations, the dataset was pivoted to a wide
format to create a variable for each months’ distance. The
dist_change
variable was created taking the difference in
average distance per county for the two months.
clinicdist =
read_csv("datasets/2022.10.01_abortionaccess_countyxmonth.csv")
countydist_change =
subset(clinicdist, year == 2022 & (month == 6 | month == 7)) %>%
select(origin_county_name, origin_state, year, month, distance_origintodest) %>%
mutate(
month = month.abb[month]
) %>%
pivot_wider(
names_from = "month",
values_from = "distance_origintodest") %>%
mutate(
dist_change = Jul - Jun
)
After selecting variables for analysis and pivoting to wide format, the mean distance for each state was taken by averaging all counties by month and change in average distance. These means were then merged by state to create one finalized distance dataset.
statedist_change =
aggregate(dist_change ~ origin_state, countydist_change, mean)
statedist_Jun =
aggregate(Jun ~ origin_state, countydist_change, mean)
statedist_Jul =
aggregate(Jul ~ origin_state, countydist_change, mean)
statedist =
merge(statedist_Jun, statedist_Jul, by = c("origin_state")) %>%
merge(statedist_change, by = c("origin_state"))
write.csv(statedist,"data_cleaned/statedist_final.csv", row.names = FALSE)
We are also interested in the abortion laws and restrictions placed
in each state. This data comes from Temple University’s
Policy Surveillance Program. The data set looks at abortion
regulations starting from December 1, 2018 through October 1, 2021. The
bans2018
is cleaned and tidied to prepare for merging with
other data sets. Note that states that had no abortion restrictions in
2018 were intentionally coded as .
by the original coder.
These were replaced with 0
to reflect their true status. We
are also using data on abortion laws restriction status by state after
Roe v. Wade was overturned. This data that include updates through
November 2022 comes from the non-profit Lucy Burns Institute Ballotpedia.
The data set was not available to download but we were able to scrape it
from the website. A year
and state_abv
variables were created, and Washington DC was renamed to
District of Columbia
.
bans2018 <-
read_excel("datasets/abortion_bans_data_Oct 2021.xlsx", sheet = "Statistical Data", range = "A1:H127") %>%
janitor::clean_names() %>%
rename(
state = jurisdiction,
abstatus = prohibit_req) %>%
mutate(
state_abv = state.abb[match(state, state.name)],
state_abv = ifelse(is.na(state_abv), "DC", state_abv),
year = lubridate::year(effective_date),
abstatus = replace(abstatus, abstatus == ".", "0")) %>%
select(state, state_abv, year, abstatus) %>%
filter(year == 2018)
bans_link <- "https://ballotpedia.org/Abortion_regulations_by_state"
bans_page <- read_html(bans_link)
bans2022 <-
bans_page %>% html_nodes("table") %>% .[2] %>%
html_table() %>% .[[1]] %>%
janitor::clean_names() %>%
rename(
state = state_abortion_restrictions_based_on_stage_of_pregnancy,
abstatus = state_abortion_restrictions_based_on_stage_of_pregnancy_2,
threshold = state_abortion_restrictions_based_on_stage_of_pregnancy_3) %>%
mutate(
state = replace(state, state == "Washington, D.C.", "District of Columbia"),
state_abv = state.abb[match(state, state.name)],
state_abv = ifelse(is.na(state_abv), "DC", state_abv),
year = 2022,
abstatus = recode(abstatus, "Yes" = "1", "No" = "0")) %>%
select(- threshold) %>%
slice(-1) %>%
head(51) %>%
arrange(state)
We then merged both 2018 and 2022 data sets after harmonizing the variable names. The resulting data set was saved as a CSV file for later use in further data processing. The new data set include 4 variables, some of which are listed below:
state
: State nameabstatus18
: Abortion restriction status in 2018 ( 1 =
restricted, 0 = not restricted)abstatus22
: Abortion restriction status in 2022 ( 1 =
restricted, 0 = not restricted)abortion_bans <-
full_join(bans2018, bans2022)%>%
arrange(state, year) %>%
pivot_wider(
names_from = year,
values_from = abstatus) %>%
rename(abstatus18 = '2018', abstatus22 = '2022')
write_csv(abortion_bans, "data_cleaned/abortion_bans_final.csv")
We also used the data on abortion laws by state from the World
Population Review. This dataset offers an overview of state abortion
laws including laws that has not yet gone into effect and may need
courts to determine hierarchy. Abortion laws status was coded as either
prohibited
, Legal
or
Functional ban
depending on the state laws in place. The We
plan to use this dataset to create a map.
state_bans <-
read_csv("datasets/state_abortion_laws2022.csv")
The voter turnout data for the 2018 and 2022 elections was pulled from US Elections Project. This source was created by a Professor in the Department of Political Science at the University of Florida. The 2022 election was the most recent election and the 2018 election was chosen as the comparison election because it was also a midterm election. The 2020 election was a Presidential election so we didn’t want to use that as a comparison for voter turnout because Presidential elections typically have higher turnout than midterm elections.
library(tidyverse)
library(tidyr)
library(readxl)
voting2022_df =
read_xlsx("datasets/2022 November General Election.xlsx", sheet = "Turnout Rates", range = "A2:N54") %>%
janitor::clean_names() %>%
rename(state = x1, turnout_estimate2022 = preliminary_total_turnout_estimate, turnoutRate2022 = preliminary_turnout_rate, voting_eligible_pop2022 = voting_eligible_population_vep, voting_age_pop2022 = voting_age_population_vap) %>%
select(state, turnout_estimate2022, turnoutRate2022, voting_eligible_pop2022, voting_age_pop2022, state_abv)
voting2018_df =
read_xlsx("datasets/2018 November General Election.xlsx", sheet = "Turnout Rates", range = "A2:P54") %>%
janitor::clean_names() %>%
rename(state = x1, turnout_estimate2018 = estimated_or_actual_2018_total_ballots_counted, turnoutRate2018 = estimated_or_actual_2018_total_ballots_counted_vep_turnout_rate, voting_eligible_pop2018 = voting_eligible_population_vep, voting_age_pop2018 = voting_age_population_vap) %>%
select(state, turnout_estimate2018, turnoutRate2018, voting_eligible_pop2018, voting_age_pop2018, state_abv)
The 2018 and 2022 datasets included TurnoutRate, which is the percent voter turnout in each state based on the number of people that are eligible to vote within that state. The other relevant variables that were kept were the turnout_estimate (the number of people who voted), voting_eligible_pop (the number of people who are eligible to vote), and voting_age_pop (the number of people old enough to vote). The two datasets were merged and difference variables were created to show the difference in each of the variables between the two years.
voting_df =
full_join(voting2022_df, voting2018_df, by = "state_abv") %>%
select(-state.x) %>%
mutate(turnout_estimate_difference = turnout_estimate2022 - turnout_estimate2018, turnout_Rate_difference = turnoutRate2022 - turnoutRate2018, voting_eligible_difference = voting_eligible_pop2022 - voting_eligible_pop2018, voting_age_difference = voting_age_pop2022 - voting_age_pop2018) %>%
rename(state = state.y)
voting_df <- voting_df[, c(6, 5, 1, 7, 11, 2, 8, 12, 3, 9, 13, 4, 10, 14)]
write.csv(voting_df,"data_cleaned/Voting Turnout - Final.csv", row.names = FALSE)