Trying to make sense of student performance and identifying possible predictors of academic success.
Between Fall 2010 and Spring 2018 I taught the two semester sequence of General Chemistry (GC). The way our curriculum was structured, these two semesters were usually taken during the sophomore year for students majoring in a Bachelor of Sciences in Health Sciences.
During all this time, while the chemistry content has not changed significantly, the forms of delivery and assessment have been evolving towards, hopefully, better pedagogies of engagment and towards a clearer assessment of learning objectives. Probably the most remarkable change was flipping the class with videos in the fall of 2014.
Let’s just look at how students have performed in the two GC semesters by looking at their final grade in different semesters.
IMPORTANT: We will see statistical significance between years and other demographics when analyzing the final percent grade. However, when we analyze the letter grade, those significances disappear. This is important because when a student is disengaged their score may be 60% or 5%, and while the means and medians may be affected, the letter grade analysis will not. Also, during the semester of Fall 2011 - Spring 2012 the laboratory was still a different course, this means that the criteria for a passing grade was not 70%, but lower.
setwd("~/Gd/Research/StudentData/Discover")
#Load demographics for all years
allGC1 <- read.csv("./genchem1_nosummer_11_16.csv",header=TRUE)
allGC2 <- read.csv("./genchem2_11_17.csv",header=TRUE)
allGC1_ <- read.csv("./genchem1_nosummer_11_16_mergedsex.csv",header = TRUE)
allGC2_ <- read.csv("./genchem2_11_17_mergedsex.csv",header = TRUE)
library(psych)
mata<-describeBy(allGC1$TG_Total.Grade....,allGC1$Semester,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "GenChem1")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | Fall 2011 | 68 | 78.43 | 10.11 | 79.50 | 78.98 | 9.79 | 42.50 | 95.90 | 53.40 |
X12 | Fall 2012 | 84 | 73.95 | 12.41 | 73.00 | 73.78 | 14.53 | 43.00 | 96.20 | 53.20 |
X13 | Fall 2013 | 69 | 83.81 | 8.17 | 84.70 | 84.49 | 6.82 | 41.60 | 96.50 | 54.90 |
X14 | Fall 2014 | 105 | 80.95 | 9.73 | 81.52 | 81.78 | 8.61 | 40.52 | 98.78 | 58.27 |
X15 | Fall 2015 | 60 | 81.44 | 9.74 | 82.68 | 82.39 | 7.19 | 43.47 | 96.33 | 52.87 |
X16 | Fall 2016 | 35 | 84.17 | 7.24 | 84.95 | 84.60 | 7.82 | 66.98 | 97.14 | 30.16 |
mata<-describeBy(allGC2$TG_Total.Grade....,allGC2$Semester,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "GenChem2")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | Spring 2011 | 16 | 87.38 | 8.34 | 89.84 | 88.07 | 6.55 | 67.70 | 97.38 | 29.68 |
X12 | Spring 2012 | 45 | 69.01 | 12.95 | 70.70 | 69.90 | 10.53 | 35.60 | 94.00 | 58.40 |
X13 | Spring 2013 | 51 | 81.51 | 9.42 | 82.00 | 82.00 | 9.93 | 53.70 | 97.50 | 43.80 |
X14 | Spring 2014 | 55 | 80.31 | 8.51 | 79.30 | 80.55 | 8.75 | 60.10 | 96.70 | 36.60 |
X15 | Spring 2015 | 61 | 78.51 | 10.00 | 77.78 | 78.55 | 11.92 | 57.26 | 96.14 | 38.88 |
X16 | Spring 2016 | 44 | 77.15 | 9.66 | 78.73 | 77.77 | 9.49 | 54.39 | 94.10 | 39.72 |
X17 | Spring 2017 | 37 | 80.01 | 9.80 | 80.78 | 80.64 | 8.01 | 48.11 | 97.22 | 49.11 |
ggplot(allGC2, aes(x=TG_Total.Grade...., fill=Semester))+geom_histogram()+ggtitle("GenChem2")
a<- TukeyHSD( aov(allGC1$TG_Total.Grade.... ~ allGC1$Semester))
b<-as.data.frame(a$`allGC1$Semester`)
knitr::kable(b, caption = "Anova. GenChem1 Grade among semesters")
diff | lwr | upr | p adj | |
---|---|---|---|---|
Fall 2012-Fall 2011 | -4.4779412 | -9.1423644 | 0.186482 | 0.0681540 |
Fall 2013-Fall 2011 | 5.3836530 | 0.4976747 | 10.269631 | 0.0211987 |
Fall 2014-Fall 2011 | 2.5216131 | -1.9292496 | 6.972476 | 0.5843701 |
Fall 2015-Fall 2011 | 3.0089288 | -2.0556712 | 8.073529 | 0.5318824 |
Fall 2016-Fall 2011 | 5.7435885 | -0.2048146 | 11.691992 | 0.0653478 |
Fall 2013-Fall 2012 | 9.8615942 | 5.2158876 | 14.507301 | 0.0000000 |
Fall 2014-Fall 2012 | 6.9995543 | 2.8138662 | 11.185242 | 0.0000345 |
Fall 2015-Fall 2012 | 7.4868700 | 2.6536537 | 12.320086 | 0.0001708 |
Fall 2016-Fall 2012 | 10.2215296 | 4.4688516 | 15.974208 | 0.0000082 |
Fall 2014-Fall 2013 | -2.8620399 | -7.2932841 | 1.569204 | 0.4353187 |
Fall 2015-Fall 2013 | -2.3747242 | -7.4220918 | 2.672643 | 0.7584349 |
Fall 2016-Fall 2013 | 0.3599354 | -5.5738024 | 6.293673 | 0.9999781 |
Fall 2015-Fall 2014 | 0.4873157 | -4.1401366 | 5.114768 | 0.9996654 |
Fall 2016-Fall 2014 | 3.2219753 | -2.3589421 | 8.802893 | 0.5638475 |
Fall 2016-Fall 2015 | 2.7346596 | -3.3470041 | 8.816323 | 0.7918982 |
a<- TukeyHSD( aov(allGC2$TG_Total.Grade.... ~ allGC2$Semester))
b<-as.data.frame(a$`allGC2$Semester`)
knitr::kable(b, caption = "Anova. GenChem2 Grade among semesters")
diff | lwr | upr | p adj | |
---|---|---|---|---|
Spring 2012-Spring 2011 | -18.3719243 | -27.016530 | -9.7273185 | 0.0000000 |
Spring 2013-Spring 2011 | -5.8749308 | -14.385113 | 2.6352511 | 0.3860347 |
Spring 2014-Spring 2011 | -7.0680859 | -15.504043 | 1.3678712 | 0.1676882 |
Spring 2015-Spring 2011 | -8.8701902 | -17.212128 | -0.5282519 | 0.0288746 |
Spring 2016-Spring 2011 | -10.2332149 | -18.903549 | -1.5628812 | 0.0094423 |
Spring 2017-Spring 2011 | -7.3733537 | -16.259708 | 1.5130002 | 0.1767662 |
Spring 2013-Spring 2012 | 12.4969935 | 6.422770 | 18.5712173 | 0.0000001 |
Spring 2014-Spring 2012 | 11.3038384 | 5.334050 | 17.2736265 | 0.0000009 |
Spring 2015-Spring 2012 | 9.5017341 | 3.665559 | 15.3379087 | 0.0000439 |
Spring 2016-Spring 2012 | 8.1387093 | 1.842068 | 14.4353503 | 0.0028667 |
Spring 2017-Spring 2012 | 10.9985706 | 4.407646 | 17.5894949 | 0.0000250 |
Spring 2014-Spring 2013 | -1.1931551 | -6.966573 | 4.5802632 | 0.9963637 |
Spring 2015-Spring 2013 | -2.9952594 | -8.630410 | 2.6398912 | 0.6966880 |
Spring 2016-Spring 2013 | -4.3582841 | -10.469068 | 1.7524994 | 0.3452669 |
Spring 2017-Spring 2013 | -1.4984229 | -7.912023 | 4.9151776 | 0.9928975 |
Spring 2015-Spring 2014 | -1.8021043 | -7.324522 | 3.7203134 | 0.9603189 |
Spring 2016-Spring 2014 | -3.1651291 | -9.172113 | 2.8418544 | 0.7053683 |
Spring 2017-Spring 2014 | -0.3052678 | -6.620048 | 6.0095122 | 0.9999993 |
Spring 2016-Spring 2015 | -1.3630247 | -7.237241 | 4.5111913 | 0.9931556 |
Spring 2017-Spring 2015 | 1.4968365 | -4.691783 | 7.6854560 | 0.9914449 |
Spring 2017-Spring 2016 | 2.8598613 | -3.764772 | 9.4844944 | 0.8600771 |
#install.packages("ggpubr")
library(ggpubr)
ggboxplot(allGC1, x = "Semester", y = "TG_Total.Grade....", title = "Final grade in GC1",
color = "Semester", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(allGC1$TG_Total.Grade....), linetype = 2) +
stat_compare_means(method = "anova", label.y = 110) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
ggboxplot(allGC2, x = "Semester", y = "TG_Total.Grade....", title = "Final grade in GC2",
color = "Semester", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(allGC2$TG_Total.Grade....), linetype = 2) +
stat_compare_means(method = "anova", label.y = 110) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
.
I converted the letter grades into the 4-scale. The plot should only show 4, 3.66, 3.33, 3… but it seems to add more variability…
#need to load this other file, as it contains the letter grades
allGC1_bosco <- read.csv("~/Research/StudentData/XavierData/Clean/allGC1.csv",header = TRUE)
a<- allGC1_bosco$Final.letter
a <- gsub("A\\-", 3.667,a)
a <- gsub("A", 4.000,a)
a <- gsub("B\\+", 3.333,a)
a <- gsub("B\\-", 2.667,a)
a <- gsub("B", 3.000,a)
a <- gsub("C\\+", 2.333,a)
a <- gsub("C\\-", 1.667,a)
a <- gsub("C", 2.000,a)
a <- gsub("D\\+", 1.333,a)
a <- gsub("D", 1.000,a)
a <- gsub("F", 0.000,a)
a <- gsub("I", 0.000,a)
allGC1_bosco$Final.letter.number <- as.numeric(as.character(a))
ggboxplot(allGC1_bosco, x = "Semester", y = "Final.letter.number", title = "Final letter grade in GC1",
color = "Semester", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(allGC1_bosco$Final.letter.number), linetype = 2) +
stat_compare_means(method = "anova", label.y = 5 ) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
#ggplot(data=allGC1_bosco,aes(x=Semester,y=Final.letter)) + geom_bar(stat="identity") + geom_bar(aes(fill = Final.letter))
setwd("~/Gd/Research/StudentData/Discover")
#lets write the prepost file into the discover folder
prePost <- read.csv("/Users/xavier/Gd/Research/StudentData/ExamPrePost.csv",header=TRUE,sep = "\t")
source("~/Gd/Research/R/deid.R")
prePost <- deIdThis(prePost)
write.csv(prePost,file="prePost.csv")
prePost <- read.csv("./prePost.csv", header = TRUE)
prePost$inc1<-prePost$Grade1-prePost$Mid1
prePost$inc2<-prePost$Grade2-prePost$Mid2
prePost$inc3<-prePost$Grade3-prePost$Mid3
prePost$meanInc <- rowMeans( prePost[c('inc1','inc2','inc3')])
prePost$meanExam <- rowMeans( prePost[c('Grade1','Grade2','Grade3')])
The final exam is a second opportunity for students to improve their semester exams. Let’s measure how exams score and improvement evolved through the years.
ggboxplot(prePost, x = "Semester", y = "meanExam", title = "Average grade in final exams",
color = "Semester", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(prePost$meanExam), linetype = 2) +
stat_compare_means(method = "anova", label.y = 105 ) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
This is plots the increment
ggboxplot(prePost, x = "Semester", y = "meanInc", title = "Average increment from semester exams to final",
color = "Semester", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(prePost$meanInc), linetype = 2) +
stat_compare_means(method = "anova", label.y = 40 ) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
There’s something funky about some of these numbers. Fall 2014 doesn’t seem to apply the >40% rule, which I actually implemented.
So let’s check that I obtain the same result if I plot grade exams from BoSCO data
allGC1_bosco$meanExam <- rowMeans( allGC1_bosco[c('Exam1','Exam2','Exam3')], na.rm=TRUE)
ggboxplot(allGC1_bosco, x = "Semester", y = "meanExam", title = "Average grade in final exams (Bosco source)",
color = "Semester", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(allGC1_bosco$meanExam), linetype = 2) +
stat_compare_means(method = "anova", label.y = 105 ) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
There are different variables that we want to look at. Performance factors such as ACT scores or GPA or High School rank , as well as demographic factors such as ethnicity and first-year generation.
mata<-describeBy(allGC1$DEM_ACT.MATH,allGC1$Semester,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "ACT Math - Fall sophomore")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | Fall 2011 | 64 | 24.75 | 3.50 | 24.5 | 24.67 | 3.71 | 18 | 33 | 15 |
X12 | Fall 2012 | 70 | 24.41 | 4.01 | 24.0 | 24.29 | 4.45 | 17 | 34 | 17 |
X13 | Fall 2013 | 66 | 25.14 | 2.82 | 25.0 | 25.26 | 2.97 | 18 | 31 | 13 |
X14 | Fall 2014 | 101 | 25.47 | 3.21 | 26.0 | 25.40 | 2.97 | 17 | 34 | 17 |
X15 | Fall 2015 | 57 | 24.72 | 3.19 | 24.0 | 24.70 | 2.97 | 18 | 32 | 14 |
X16 | Fall 2016 | 32 | 24.94 | 2.64 | 25.0 | 24.96 | 2.97 | 19 | 30 | 11 |
mata<-describeBy(allGC2$DEM_ACT.MATH,allGC2$Semester,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "ACT Math - Spring sophomore")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | Spring 2011 | 16 | 25.88 | 4.18 | 25.5 | 25.79 | 3.71 | 19 | 34 | 15 |
X12 | Spring 2012 | 42 | 25.57 | 3.47 | 25.0 | 25.56 | 2.97 | 18 | 33 | 15 |
X13 | Spring 2013 | 44 | 26.23 | 4.15 | 26.0 | 26.17 | 4.45 | 18 | 34 | 16 |
X14 | Spring 2014 | 52 | 24.90 | 2.76 | 25.0 | 25.17 | 2.97 | 18 | 29 | 11 |
X15 | Spring 2015 | 58 | 26.10 | 3.36 | 26.0 | 26.04 | 2.97 | 19 | 34 | 15 |
X16 | Spring 2016 | 42 | 25.10 | 3.27 | 25.0 | 25.00 | 2.97 | 18 | 32 | 14 |
X17 | Spring 2017 | 33 | 25.27 | 2.47 | 26.0 | 25.30 | 2.97 | 21 | 30 | 9 |
We see that the second semester is a subselection of the first semester with a higher ACT math score. Therefore, we can just use GenChem1 for the analysis.
As we can see below. There is no significant difference in ACT throughout the years
a<- TukeyHSD( aov(allGC1$DEM_ACT.MATH ~ allGC1$Semester))
b<-as.data.frame(a$`allGC1$Semester`)
knitr::kable(b, caption = "Anova. ACTMath among semesters")
diff | lwr | upr | p adj | |
---|---|---|---|---|
Fall 2012-Fall 2011 | -0.3357143 | -1.9769059 | 1.3054774 | 0.9919345 |
Fall 2013-Fall 2011 | 0.3863636 | -1.2784117 | 2.0511390 | 0.9856326 |
Fall 2014-Fall 2011 | 0.7153465 | -0.8007861 | 2.2314792 | 0.7559117 |
Fall 2015-Fall 2011 | -0.0307018 | -1.7589701 | 1.6975666 | 1.0000000 |
Fall 2016-Fall 2011 | 0.1875000 | -1.8670492 | 2.2420492 | 0.9998340 |
Fall 2013-Fall 2012 | 0.7220779 | -0.9060719 | 2.3502278 | 0.8010990 |
Fall 2014-Fall 2012 | 1.0510608 | -0.4247621 | 2.5268838 | 0.3217601 |
Fall 2015-Fall 2012 | 0.3050125 | -1.3880045 | 1.9980295 | 0.9955408 |
Fall 2016-Fall 2012 | 0.5232143 | -1.5017715 | 2.5482001 | 0.9767972 |
Fall 2014-Fall 2013 | 0.3289829 | -1.1730225 | 1.8309883 | 0.9889559 |
Fall 2015-Fall 2013 | -0.4170654 | -2.1329539 | 1.2988231 | 0.9823133 |
Fall 2016-Fall 2013 | -0.1988636 | -2.2430100 | 1.8452827 | 0.9997726 |
Fall 2015-Fall 2014 | -0.7460483 | -2.3181344 | 0.8260379 | 0.7513444 |
Fall 2016-Fall 2014 | -0.5278465 | -2.4528701 | 1.3971770 | 0.9699093 |
Fall 2016-Fall 2015 | 0.2182018 | -1.8779779 | 2.3143814 | 0.9996831 |
ggboxplot(allGC1, x = "Semester", y = "DEM_ACT.MATH", title = "ACT Math in GC1",
color = "Semester", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(allGC1$DEM_ACT.MATH, na.rm = TRUE), linetype = 2) +
stat_compare_means(method = "anova", label.y = 40) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
We obtain a r-squared for both 0.2042773 and 0.1569021, respectively. We need to find a better predictor. Let’s see cumulative GPA before enrolling
While ACT.Math historically seems to correlate well, since we’re teaching sophomores, previous GPA is even a better predictor
mata<-describeBy(allGC1$DEM_Cumulative.GPA,allGC1$Semester,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "GenChem1")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | Fall 2011 | 65 | 3.11 | 0.46 | 3.14 | 3.13 | 0.52 | 1.88 | 3.93 | 2.05 |
X12 | Fall 2012 | 79 | 2.94 | 0.53 | 2.94 | 2.95 | 0.59 | 1.44 | 3.98 | 2.54 |
X13 | Fall 2013 | 69 | 3.18 | 0.44 | 3.21 | 3.19 | 0.53 | 1.84 | 3.98 | 2.14 |
X14 | Fall 2014 | 105 | 3.00 | 0.48 | 2.98 | 3.00 | 0.47 | 1.33 | 4.00 | 2.67 |
X15 | Fall 2015 | 60 | 3.00 | 0.39 | 3.00 | 2.98 | 0.33 | 2.18 | 3.97 | 1.79 |
X16 | Fall 2016 | 35 | 3.12 | 0.48 | 3.26 | 3.14 | 0.42 | 2.06 | 4.00 | 1.94 |
mata<-describeBy(allGC2$DEM_Cumulative.GPA,allGC2$Semester,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "GenChem2")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | Spring 2011 | 16 | 3.46 | 0.43 | 3.54 | 3.47 | 0.46 | 2.73 | 4.00 | 1.27 |
X12 | Spring 2012 | 43 | 3.18 | 0.39 | 3.19 | 3.19 | 0.40 | 2.33 | 3.95 | 1.62 |
X13 | Spring 2013 | 50 | 3.20 | 0.48 | 3.24 | 3.24 | 0.39 | 2.05 | 3.97 | 1.92 |
X14 | Spring 2014 | 53 | 3.25 | 0.44 | 3.27 | 3.26 | 0.50 | 2.18 | 3.98 | 1.80 |
X15 | Spring 2015 | 61 | 3.18 | 0.46 | 3.19 | 3.18 | 0.49 | 2.19 | 4.00 | 1.81 |
X16 | Spring 2016 | 44 | 3.07 | 0.41 | 3.05 | 3.06 | 0.36 | 2.13 | 3.96 | 1.83 |
X17 | Spring 2017 | 36 | 3.23 | 0.40 | 3.26 | 3.25 | 0.34 | 2.13 | 4.00 | 1.87 |
a<- TukeyHSD( aov(allGC1$DEM_Cumulative.GPA ~ allGC1$Semester))
b<-as.data.frame(a$`allGC1$Semester`)
knitr::kable(b, caption = "Anova. Entering GPA among semesters")
diff | lwr | upr | p adj | |
---|---|---|---|---|
Fall 2012-Fall 2011 | -0.1679124 | -0.3930252 | 0.0572004 | 0.2709961 |
Fall 2013-Fall 2011 | 0.0728361 | -0.1595233 | 0.3051956 | 0.9469705 |
Fall 2014-Fall 2011 | -0.1069817 | -0.3191411 | 0.1051777 | 0.7000993 |
Fall 2015-Fall 2011 | -0.1160769 | -0.3567413 | 0.1245875 | 0.7384580 |
Fall 2016-Fall 2011 | 0.0137802 | -0.2680571 | 0.2956175 | 0.9999925 |
Fall 2013-Fall 2012 | 0.2407485 | 0.0192443 | 0.4622527 | 0.0242003 |
Fall 2014-Fall 2012 | 0.0609307 | -0.1392812 | 0.2611426 | 0.9531219 |
Fall 2015-Fall 2012 | 0.0518354 | -0.1783657 | 0.2820366 | 0.9874925 |
Fall 2016-Fall 2012 | 0.1816926 | -0.0912643 | 0.4546495 | 0.3999188 |
Fall 2014-Fall 2013 | -0.1798178 | -0.3881444 | 0.0285087 | 0.1350707 |
Fall 2015-Fall 2013 | -0.1889130 | -0.4262055 | 0.0483794 | 0.2048113 |
Fall 2016-Fall 2013 | -0.0590559 | -0.3380193 | 0.2199075 | 0.9905679 |
Fall 2015-Fall 2014 | -0.0090952 | -0.2266461 | 0.2084557 | 0.9999966 |
Fall 2016-Fall 2014 | 0.1207619 | -0.1416144 | 0.3831382 | 0.7750540 |
Fall 2016-Fall 2015 | 0.1298571 | -0.1560608 | 0.4157750 | 0.7847500 |
ggboxplot(allGC1, x = "Semester", y = "DEM_Cumulative.GPA", title = "Entering GPA in GC1",
color = "Semester", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(allGC1$DEM_Cumulative.GPA, na.rm = TRUE), linetype = 2) +
stat_compare_means(method = "anova", label.y = 5) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
When we plot previous GPA (typically first year GPA) against final grade
In this case we obtain better r-squared for both 0.656591 and 0.5840838, respectively
For large schools, highschool(HS) ranking can be used as a better measurement than HS GPA. Also, HS-GPA is currently unavailable :). The units are given in percentile, so the higher the better
mata<-describeBy(allGC1$DEM_HS.Rank,allGC1$Semester,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "GenChem1")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | Fall 2011 | 60 | 79.18 | 14.08 | 80.5 | 80.48 | 15.57 | 46 | 97 | 51 |
X12 | Fall 2012 | 65 | 73.57 | 17.01 | 76.0 | 74.98 | 16.31 | 20 | 99 | 79 |
X13 | Fall 2013 | 57 | 81.21 | 12.33 | 81.0 | 82.04 | 13.34 | 47 | 99 | 52 |
X14 | Fall 2014 | 86 | 79.43 | 16.62 | 84.0 | 81.56 | 13.34 | 26 | 99 | 73 |
X15 | Fall 2015 | 51 | 81.27 | 13.83 | 86.0 | 82.93 | 8.90 | 37 | 99 | 62 |
X16 | Fall 2016 | 25 | 82.28 | 11.75 | 85.0 | 82.95 | 10.38 | 60 | 98 | 38 |
a<- TukeyHSD( aov(allGC1$DEM_HS.Rank ~ allGC1$Semester))
b<-as.data.frame(a$`allGC1$Semester`)
knitr::kable(b, caption = "Anova. HS ranking among semesters")
diff | lwr | upr | p adj | |
---|---|---|---|---|
Fall 2012-Fall 2011 | -5.6141026 | -13.2615244 | 2.033319 | 0.2876124 |
Fall 2013-Fall 2011 | 2.0271930 | -5.8736280 | 9.928014 | 0.9774141 |
Fall 2014-Fall 2011 | 0.2468992 | -6.9383845 | 7.432183 | 0.9999987 |
Fall 2015-Fall 2011 | 2.0911765 | -6.0444897 | 10.226843 | 0.9772356 |
Fall 2016-Fall 2011 | 3.0966667 | -7.0718166 | 13.265150 | 0.9527517 |
Fall 2013-Fall 2012 | 7.6412955 | -0.1100688 | 15.392660 | 0.0558988 |
Fall 2014-Fall 2012 | 5.8610018 | -1.1596093 | 12.881613 | 0.1616772 |
Fall 2015-Fall 2012 | 7.7052790 | -0.2853243 | 15.695882 | 0.0659401 |
Fall 2016-Fall 2012 | 8.7107692 | -1.3420278 | 18.763566 | 0.1318975 |
Fall 2014-Fall 2013 | -1.7802938 | -9.0761070 | 5.515519 | 0.9819290 |
Fall 2015-Fall 2013 | 0.0639835 | -8.1694637 | 8.297431 | 1.0000000 |
Fall 2016-Fall 2013 | 1.0694737 | -9.1774107 | 11.316358 | 0.9996775 |
Fall 2015-Fall 2014 | 1.8442772 | -5.7052249 | 9.393779 | 0.9818377 |
Fall 2016-Fall 2014 | 2.8497674 | -6.8561055 | 12.555640 | 0.9594958 |
Fall 2016-Fall 2015 | 1.0054902 | -9.4235429 | 11.434523 | 0.9997815 |
ggboxplot(allGC1, x = "Semester", y = "DEM_HS.Rank", title = "Highschool Rank in GC1",
color = "Semester", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(allGC1$DEM_HS.Rank, na.rm = TRUE), linetype = 2) +
stat_compare_means(method = "anova", label.y = 110) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
Fall2012 seems to stand out again.
Fairly poor r-squared for both 0.1667476 and 0.0552476, respectively
Given the good correlation given above between previous GPA and final grade, let’s then analyze how students of different demographics perform in chemistry when compared to their incoming GPA. In other words, instead of comparing how first-generation vs non-first-generation do, it is more interesting to see how considering their college readiness (as desribed by GPA) how they did in GenChem
Look at how previous GPA and GenChem grades is among selfidentified genders. There was no data besides male and female.
#there are some underfined that mess up the graphs
onlyMF_gc1<- allGC1_[complete.cases(allGC1_$Sex),]
onlyMF_gc2<- allGC2_[complete.cases(allGC2_$Sex),]
mata<-describeBy(onlyMF_gc1$DEM_Cumulative.GPA,onlyMF_gc1$Sex,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "1st year GPA and Sex")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | F | 286 | 3.08 | 0.45 | 3.06 | 3.08 | 0.48 | 1.88 | 4 | 2.12 |
X12 | M | 117 | 3.00 | 0.53 | 3.04 | 3.01 | 0.49 | 1.33 | 4 | 2.67 |
mata<-describeBy(onlyMF_gc1$DEM_ACT.MATH,onlyMF_gc1$Sex,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "ACT math and Sex")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | F | 270 | 24.61 | 3.09 | 25 | 24.61 | 2.97 | 17 | 34 | 17 |
X12 | M | 111 | 25.82 | 3.71 | 26 | 25.83 | 2.97 | 17 | 34 | 17 |
mata<-describeBy(onlyMF_gc1$DEM_HS.Rank,onlyMF_gc1$Sex,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "HS rank and Sex")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | F | 239 | 81.02 | 14.21 | 84.0 | 82.64 | 11.86 | 25 | 99 | 74 |
X12 | M | 96 | 74.40 | 16.42 | 76.5 | 75.46 | 17.05 | 20 | 99 | 79 |
From the above, we can see that females come to GenChem with very slightly higher GPA and remarkably better HS ranking, but with a lower ACT-math score. Also, males have a broader range of values and higher standard deviation, this tell us that male performance may not be treated as a single group, and it may require a further finer classification. In any case, How will these three factors affect their performance in GenChem? The number of students may not be exactly the same because not all students have ACT or HS data.
mata<-describeBy(onlyMF_gc1$TG_Total.Grade....,onlyMF_gc1$Sex,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "GenChem1 grade and Sex")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | F | 289 | 79.74 | 9.84 | 81.0 | 80.49 | 9.49 | 42.50 | 98.78 | 56.28 |
X12 | M | 120 | 80.88 | 12.12 | 82.1 | 82.36 | 11.15 | 40.52 | 97.14 | 56.62 |
mata<-describeBy(onlyMF_gc2$TG_Total.Grade....,onlyMF_gc2$Sex,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "GenChem2 grade and Sex")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | F | 196 | 78.18 | 10.51 | 79.28 | 78.88 | 9.62 | 38.20 | 97.50 | 59.30 |
X12 | M | 102 | 80.31 | 10.19 | 80.57 | 80.76 | 13.00 | 56.39 | 97.22 | 40.83 |
While it may look like males do better than females, even though females came with better GPA and HS ranking, there is actually no significant difference when compared the two groups in general.
#install.packages("ggpubr")
library(ggpubr)
p <- ggboxplot(onlyMF_gc1, x = "Sex", y = "TG_Total.Grade....", color = "Sex", palette = "jco", add = "jitter")
#p + stat_compare_means(method = "t.test")
p + stat_compare_means() #default is wilcox for comparing non-parametric two groups
However, when the two groups are compared each semester we notice that Fall 2011 is the only semester with a significant difference between genders.
p <- ggboxplot(onlyMF_gc1, x = "Semester.x", y = "TG_Total.Grade....", color = "Sex", palette = "jco", add = "jitter")
#p + stat_compare_means(method = "t.test")
p + stat_compare_means(aes(group=Sex),label="p.format") #default is wilcox for comparing non-parametric two groups
Before we jump into conclusions, however, we may need to look into how the females in Fall 2011 performed compared to other semester’s females.
#selecting females
onlyF_gc1 <- onlyMF_gc1[onlyMF_gc1$Sex=="F",]
onlyM_gc1 <- onlyMF_gc1[onlyMF_gc1$Sex=="M",]
ggboxplot(onlyF_gc1, x = "Semester.x", y = "TG_Total.Grade....", title = "Females in GC1",
color = "Semester.x", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(onlyF_gc1$TG_Total.Grade....), linetype = 2) +
stat_compare_means(method = "anova", label.y = 110) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
ggboxplot(onlyM_gc1, x = "Semester.x", y = "TG_Total.Grade....", title = "Males in GC1",
color = "Semester.x", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(onlyM_gc1$TG_Total.Grade....), linetype = 2) +
stat_compare_means(method = "anova", label.y = 110) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
ggboxplot(onlyF_gc1, x = "Semester.x", y = "DEM_Cumulative.GPA", title = "Incoming GPA for females",
color = "Semester.x", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(onlyF_gc1$DEM_Cumulative.GPA, na.rm = TRUE), linetype = 2) +
stat_compare_means(method = "anova", label.y = 5) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
ggboxplot(onlyM_gc1, x = "Semester.x", y = "DEM_Cumulative.GPA", title = "Incoming GPA for males",
color = "Semester.x", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(onlyM_gc1$DEM_Cumulative.GPA, na.rm = TRUE), linetype = 2) +
stat_compare_means(method = "anova", label.y = 5) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
ggboxplot(onlyF_gc1, x = "Semester.x", y = "DEM_HS.Rank", title = "HS Ranking for females",
color = "Semester.x", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(onlyF_gc1$DEM_HS.Rank, na.rm = TRUE), linetype = 2) +
stat_compare_means(method = "anova", label.y = 110) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
ggboxplot(onlyM_gc1, x = "Semester.x", y = "DEM_HS.Rank", title = "HS Ranking for males",
color = "Semester.x", add = "jitter", legend="none") + rotate_x_text(angle = 45) +
geom_hline( yintercept = mean(onlyM_gc1$DEM_HS.Rank, na.rm = TRUE), linetype = 2) +
stat_compare_means(method = "anova", label.y = 110) +
stat_compare_means(label = "p.format", method = "t.test", ref.group = ".all.")
We saw that females had performed significantly lower in Fall2011, and almost significantly higher in Fall2013 than males. However, we see that these differences may also be explained by the differences with the incoming GPAs, but not by HS ranking. Also, many students lack HS Ranking so the statistics may be lacking.
Let’s compare the GPA before enrolling in GenChem for students selfidentified ethnicity.
mata<-describeBy(allGC1$DEM_Cumulative.GPA,allGC1$DEM_Student.of.Color,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "1st year GPA and Student of Color: Y/N")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | N | 340 | 3.09 | 0.47 | 3.11 | 3.10 | 0.47 | 1.33 | 4.00 | 2.67 |
X12 | Y | 73 | 2.84 | 0.46 | 2.78 | 2.82 | 0.43 | 1.90 | 3.97 | 2.07 |
mata<-describeBy(allGC1$TG_Total.Grade....,allGC1$DEM_Student.of.Color,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "GenChem1 grade and Student of Color: Y/N")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | N | 348 | 81.06 | 10.20 | 82.34 | 82.13 | 8.65 | 40.52 | 98.78 | 58.27 |
X12 | Y | 73 | 74.67 | 10.48 | 75.30 | 74.58 | 12.04 | 43.47 | 96.02 | 52.56 |
mata<-describeBy(allGC2$TG_Total.Grade....,allGC2$DEM_Student.of.Color,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "GenChem2 grade and Student of Color: Y/N")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | N | 260 | 79.13 | 10.33 | 79.74 | 79.75 | 10.53 | 43.7 | 97.5 | 53.8 |
X12 | Y | 49 | 74.44 | 12.76 | 72.95 | 75.31 | 9.87 | 35.6 | 96.7 | 61.1 |
require(gridExtra)
plotA <-ggplot(allGC1, aes(x=TG_Total.Grade...., fill=DEM_Student.of.Color)) + geom_histogram() + ggtitle("GenChem1 by Ethnicity")
plotB <-ggplot(allGC1, aes(x=DEM_Cumulative.GPA, fill=DEM_Student.of.Color)) + geom_histogram() + ggtitle("Prev GPA by Ethnicity")
grid.arrange(plotA,plotB)
p <- ggboxplot(allGC1, x = "DEM_Student.of.Color", y = "TG_Total.Grade....", color = "DEM_Student.of.Color", palette = "jco", add = "jitter")
#p + stat_compare_means(method = "t.test")
p + stat_compare_means() #default is wilcox for comparing non-parametric two groups
p <- ggboxplot(allGC1, x = "Semester", y = "TG_Total.Grade....", color = "DEM_Student.of.Color", palette = "jco", add = "jitter")
#p + stat_compare_means(method = "t.test")
p + stat_compare_means(aes(group=DEM_Student.of.Color),label="p.format") #default is wilcox for comparing non-parametric two groups
mata<-describeBy(allGC1$DEM_Cumulative.GPA,allGC1$DEM_Ethnicity,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "1st year GPA for different ethnicities")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | 0 | NaN | NA | NA | NaN | NA | Inf | -Inf | -Inf | |
X12 | Am. Indian | 5 | 2.74 | 0.36 | 2.62 | 2.74 | 0.43 | 2.33 | 3.23 | 0.90 |
X13 | Asian | 38 | 2.88 | 0.47 | 2.81 | 2.86 | 0.36 | 1.90 | 3.97 | 2.07 |
X14 | Black | 29 | 2.96 | 0.50 | 2.89 | 2.95 | 0.56 | 2.12 | 3.95 | 1.83 |
X15 | Hawaiian | 1 | 3.11 | NA | 3.11 | 3.11 | 0.00 | 3.11 | 3.11 | 0.00 |
X16 | Hispanic | 13 | 2.78 | 0.38 | 2.70 | 2.76 | 0.36 | 2.27 | 3.47 | 1.20 |
X17 | NS | 3 | 2.72 | 0.66 | 2.37 | 2.72 | 0.07 | 2.32 | 3.48 | 1.16 |
X18 | White | 324 | 3.10 | 0.47 | 3.11 | 3.11 | 0.47 | 1.33 | 4.00 | 2.67 |
We can also run an anova among different ethnicities, but in any case it’s hard to do statistics on such small numbers maybe only black and asian are large enough to be compared with whites.
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = allGC1$TG_Total.Grade.... ~ allGC1$DEM_Ethnicity)
$`allGC1$DEM_Ethnicity`
diff lwr upr p adj
Am. Indian- -1.2520311 -24.3127791 21.808717 0.9999998
Asian- 1.8944333 -17.0079964 20.796863 0.9999879
Black- 0.2271630 -18.9237458 19.378072 1.0000000
Hawaiian- 6.1165333 -30.3457108 42.578777 0.9996050
Hispanic- -0.4325654 -20.6581794 19.793049 1.0000000
NS- -4.9680000 -30.7507001 20.814700 0.9990190
White- 5.7148568 -12.5997033 24.029417 0.9807262
Asian-Am. Indian 3.1464644 -11.8319308 18.124860 0.9982867
Black-Am. Indian 1.4791941 -13.8115804 16.769969 0.9999905
Hawaiian-Am. Indian 7.3685644 -27.2225576 41.959686 0.9981266
Hispanic-Am. Indian 0.8194657 -15.7975718 17.436503 0.9999999
NS-Am. Indian -3.7159689 -26.7767169 19.344779 0.9996975
White-Am. Indian 6.9668879 -7.2624335 21.196209 0.8117453
Black-Asian -1.6672703 -9.3686684 6.034128 0.9979230
Hawaiian-Asian 4.2221000 -27.7474084 36.191609 0.9999204
Hispanic-Asian -2.3269987 -12.4081536 7.754156 0.9968822
NS-Asian -6.8624333 -25.7648630 12.039996 0.9553407
White-Asian 3.8204235 -1.4689372 9.109784 0.3535646
Hawaiian-Black 5.8893703 -26.2276802 38.006421 0.9992900
Hispanic-Black -0.6597284 -11.1994223 9.879965 0.9999995
NS-Black -5.1951630 -24.3460719 13.955746 0.9915567
White-Black 5.4876938 -0.6305412 11.605929 0.1158453
Hispanic-Hawaiian -6.5490987 -39.3183386 26.220141 0.9987569
NS-Hawaiian -11.0845333 -47.5467775 25.377711 0.9834226
White-Hawaiian -0.4016765 -32.0271526 31.223800 1.0000000
NS-Hispanic -4.5354346 -24.7610486 15.690179 0.9974027
White-Hispanic 6.1474222 -2.7829165 15.077761 0.4184543
White-NS 10.6828568 -7.6317033 28.997417 0.6360504
Let’s compare the GPA before enrolling in GenChem for 1st generation vs the rest. Notice for how many people we have information (a total of 421 students in Genchem1 and 309 in GenChem2)
mata<-describeBy(allGC1$DEM_Cumulative.GPA,allGC1$DEM_First.Generation,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "1st year GPA and 1st generation: Y/N")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | 20 | 2.85 | 0.58 | 2.70 | 2.79 | 0.45 | 1.88 | 4.00 | 2.12 | |
X12 | N | 248 | 3.05 | 0.48 | 3.09 | 3.07 | 0.50 | 1.33 | 3.98 | 2.65 |
X13 | Y | 145 | 3.07 | 0.44 | 3.02 | 3.06 | 0.43 | 1.90 | 4.00 | 2.10 |
mata<-describeBy(allGC1$TG_Total.Grade....,allGC1$DEM_First.Generation,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "GenChem1 grade and 1st generation: Y/N")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | 24 | 72.63 | 14.01 | 73.62 | 72.63 | 15.20 | 42.50 | 97.14 | 54.64 | |
X12 | N | 252 | 79.81 | 10.51 | 81.07 | 80.73 | 10.29 | 40.52 | 97.07 | 56.55 |
X13 | Y | 145 | 81.41 | 9.36 | 82.18 | 82.20 | 8.03 | 53.00 | 98.78 | 45.78 |
mata<-describeBy(allGC2$TG_Total.Grade....,allGC2$DEM_First.Generation,mat=TRUE,digits = 2)
knitr::kable(mata[,c(2,4,5,6,7,8,9,10,11,12)] , caption = "GenChem2 grade and 1st generation: Y/N")
group1 | n | mean | sd | median | trimmed | mad | min | max | range | |
---|---|---|---|---|---|---|---|---|---|---|
X11 | 16 | 78.97 | 11.79 | 79.30 | 79.09 | 14.52 | 60.1 | 96.14 | 36.04 | |
X12 | N | 187 | 78.26 | 10.96 | 79.10 | 78.94 | 11.12 | 38.2 | 97.50 | 59.30 |
X13 | Y | 106 | 78.52 | 10.64 | 79.42 | 79.24 | 10.49 | 35.6 | 96.70 | 61.10 |
ggplot(allGC1, aes(x=TG_Total.Grade...., fill=DEM_First.Generation )) + geom_histogram() + ggtitle("GenChem1 by First Generation")
ggplot(allGC2, aes(x=TG_Total.Grade...., fill=DEM_First.Generation))+geom_histogram()+ggtitle("GenChem2 by First Generation")
p <- ggboxplot(allGC1, x = "DEM_First.Generation", y = "TG_Total.Grade....", color = "DEM_First.Generation", palette = "jco", add = "jitter")
#p + stat_compare_means(method = "t.test")
p + stat_compare_means() #default is wilcox for comparing non-parametric two groups
p <- ggboxplot(allGC1, x = "Semester", y = "TG_Total.Grade....", color = "DEM_First.Generation", palette = "jco", add = "jitter")
#p + stat_compare_means(method = "t.test")
p + stat_compare_means(aes(group=DEM_First.Generation),label="p.format") #default is wilcox for comparing non-parametric two groups
p <- ggboxplot(allGC2, x = "Semester", y = "TG_Total.Grade....", color = "DEM_First.Generation", palette = "jco", add = "jitter") + rotate_x_text(angle = 45)
#p + stat_compare_means(method = "t.test")
p + stat_compare_means(aes(group=DEM_First.Generation),label="p.format") #default is wilcox for comparing non-parametric two groups
First generation students seem to do slightly better or the same than the rest. Are they coming in with equal preparation? We can look at HS rank to try to answer that.
p <- ggboxplot(allGC1, x = "DEM_First.Generation", y = "DEM_HS.Rank", color = "DEM_First.Generation", palette = "jco", add = "jitter")
#p + stat_compare_means(method = "t.test")
p + stat_compare_means() #default is wilcox for comparing non-parametric two groups
p <- ggboxplot(allGC1, x = "Semester", y = "DEM_HS.Rank", color = "DEM_First.Generation", palette = "jco", add = "jitter")
#p + stat_compare_means(method = "t.test")
p + stat_compare_means(aes(group=DEM_HS.Rank),label="p.format") #default is wilcox for comparing non-parametric two groups
It seems that the first generation students are already better prepared than the non-first generation.
For attribution, please cite this work as
Prat-Resina (2018, Aug. 10). Prat-Resina's blog: Analysis of seven years of General Chemistry student data. Retrieved from https://xavierprat.github.io/Blog/posts/analysis_seven_years_genchem/
BibTeX citation
@misc{prat-resina2018analysis, author = {Prat-Resina, Xavier}, title = {Prat-Resina's blog: Analysis of seven years of General Chemistry student data}, url = {https://xavierprat.github.io/Blog/posts/analysis_seven_years_genchem/}, year = {2018} }