Categories
Uncategorized

Week 5: a Synthesis

  • Respond to the questions raised in the comments and revise your questions in light of those comments.  
  • Include a block of code that reads in data, structures that data, and completes either a chi-square test or a linear regression.
    • If you undertake a chi square test you must:Use a for-loop to simplify a categorical column (i.e. convert towns to a boolean vector of in_cumberland, not_in_cumberland) and use that newly created column in your chi-square test
    • If you undertake a regression analysis you must: merge one or more tables to produce a table that has at least three numerical columns.  Use those numerical columns in your regression analysis.
  • Interpret the results of that regression or chi-square test
  • Explicitly reference at least one reading from this week, and comment on how one (or both) methods discussed in the readings might help your final project.

My original questions regarding the code for the final project had to do with understanding the ledgers from Bates College’s early history, and doing some background digging to understand where that money came from, and what that money was used for. The point of this question was specifically tailored towards Bates’ past vs. Bates’ present, and seeing if what we have access to today on Bates’ campus, was a direct donation made from enslaved labor. Doing this question as a final project requires some design justice elements, as Costanza-Chock puts it, ‘sustaining, healing, and empowering our communities’, ‘centering the voices of those who are directly impacted’, and ‘prioritizing the design’s impact on the community over the intentions of the designer’. With the question centered around Bates and how we envision Bates today in light of its past, it only seems right that the purpose behind the study is to apply more of that history to what we have today, such that things like buildings or houses are not taken for granted or viewed particularly in the same light as they had been previously. History adds to a story, and the histories of specific pieces of Bates would help ground those stories greatly.

In this section, I will use the Chi-Squared Goodness of Fit test to determine whether than is a significant difference between the observed and expected values for donations from towns from Cumberland, and amount of donations made.

cumberland_towns <- c("baldwin", 'bridgton', 'brunswick', 'cape elizabeth', 'casco', 'chebeague island', 'cumberland', 'falmouth', 'freeport', 'frye island', 'gorham', 'gray', 'harpswell', 'harrison', 'long island', 'naples', 'new gloucester', 'north yarmouth', 'portland', 'pownal', 'raymond', 'scarborough', 'sebago', 'south portland', 'standish', 'westbrook', 'windham', 'yarmouth')

donors$is_cumberland <- ''

for (i in 1:nrow(donors)) {
    location_to_check <- donors$Location[i]
    location_test <- location_to_check %in% cumberland_towns
    donors$is_cumberland[i] <- location_test
}

donors$donation_net <- ''
donors$donation_net[donors$Amount <= 1] <- 'Low'
donors$donation_net[donors$Amount <= 5 & donors$Amount > 1] <- 'Medium Low'
donors$donation_net[donors$Amount <= 20 & donors$Amount > 5] <- 'Medium'
donors$donation_net[donors$Amount > 20] <- 'High'

What this code above is doing is it is creating a new vector within the donors table, called is_cumberland. Within another vector, we have added all the towns that reside within Cumberland County in Maine, called cumberland_towns. The for loop below goes through each row of the dataframe, checks the location column, and sees whether that town matches with any of the values in cumberland_towns. Depending on whether this is true or not, the TRUE/FALSE boolean will be put in that index of the is_cumberland vector each time.

A new column called donation_net was also created. This assigned values to each of the donation amounts from each row in the entire dataframe. If a donation amount fell within a certain range, it was assigned with a value of ‘low’, ‘medium low’, ‘medium’, or ‘high’.

We then ran a chi-squared test on the is_cumberland, and donation_net values. In doing this, we are seeing whether donation amounts by county varied by chance, and seeing where there is a correlation between county and amount donated.

chisq.test(donors$is_cumberland, donors$donation_net)

The p-value received from this test was .002825. This means that we can assume that there is a .2% chance that the spread of donations and counties occurred randomly.

css.php