3

I am quite new to R programming and have been given the task of representing some data in a boxplot. We were only provided the five figure summary of the data, i.e the lowest value, lower quartile,median,upper quartile,highest value. We are also told the amount of samples (n).

I read bxp was a function similar to boxplot but drew the boxplot based upon this five figure summary.

However, I know varwidth can be used to change the width of boxes proportionate to N, yet it does not seem to work here as all boxes are the same length. This is what I need help with.

MORSEYear1 <- c(18.2,58.5,64.4,73.4,91.1)
MORSEYear2 <- c(22.3,56.4,64.3,75.7,97.4)
MORSEYear3 <- c(29.1,57.9,66.6,73.4,86.0)
MathStatYear1 <- c(46.8,54.8,66.1,71.4,84.1)
MathStatYear2 <- c(35.1,47.8,57.8,65.7,82.8)
MathStatYear3 <- c(32.6,56.3,61.1,75.6,89.4)

MORSE1<-list(stats=matrix(MORSEYear1,MORSEYear1[5],MORSEYear1[1]), n=139)
MORSE2<-list(stats=matrix(MORSEYear2,MORSEYear2[5],MORSEYear2[1]), n=132)
MORSE3<-list(stats=matrix(MORSEYear3,MORSEYear3[5],MORSEYear3[1]), n=131)

MS1 <- list(stats=matrix(MathStatYear1,MathStatYear1[5],MathStatYear1[1]), n= 21)
MS2 <- list(stats=matrix(MathStatYear2,MathStatYear2[5],MathStatYear2[1]), n=20)
MS3 <- list(stats=matrix(MathStatYear3,MathStatYear3[5],MathStatYear3[1]), n= 14)

bxp(MORSE1, xlim = c(0.5,6.5),ylim = c(0,100),varwidth= TRUE, main = "Graph comparing distribution of marks across different years of MORSE and MathStat",ylab = "Marks", xlab = "Course and year of study (Course,Year)", axes = FALSE)
par(new=T)
bxp(MORSE2, xlim = c(-0.5,5.5), ylim = c(0,100),axes= TRUE, varwidth=TRUE)
par(new=T)
bxp(MORSE3, xlim = c(-1.5,4.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
par(new=T)
bxp(MS1, xlim = c(-2.5,3.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
par(new=T)
bxp(MS2, xlim = c(-3.5,2.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)
par(new=T)
bxp(MS3, xlim = c(-4.5,1.5), ylim = c(0,100), varwidth=TRUE, axes = FALSE)

NOTE: My supervisor said to use par(new=T) and change the xlim to plot multiple graphs using bxp(), if someone could verify if this is the best method or not that would be great!

Thanks

1 Answer 1

1

Stumbled upon the same problem, without much experience with R.

The varwidth argument of the bxp() function requires multiple boxplots being plotted at once. Adding to an initial plot does not count, as no readjustment is possible after the fact.

The question is how to construct a multidimensional z argument for bxp(). To answer this, a look at the result of something like boxplot(c(c(1,1),c(2,2))~c(c(11,11),c(22,22))) helps.

First, a generic example with made-up data to aid anyone that lands here:

# data
d1 <- c(1,2,3,4,5)
d2 <- c(1,2,3,5,8,13,21,34)

# summaries (generated with quantile and structured accordingly)
z1 <- list(
    stats=matrix(quantile(d1, c(0.05,0.25,0.5,0.75,0.85))),
    n=length(d1)
)
z2 <- list(
    stats=matrix(quantile(d2, c(0.05,0.25,0.5,0.75,0.85))),
    n=length(d2)
)

# merging the summaries appropriately
z <- list(
    stats=cbind(z1$stats,z2$stats),
    n=c(z1$n,z2$n)
)

# check result
print(z)

# call bxp with needed parameters ("at" can/should also be used here)
bxp(z=z,varwidth=TRUE)

In the case of the original question, one should merge MORSE# and MS#. The code is far from optimal - there might be a better way to merge and a function for this can be written, but the aim is ugly clarity and simplicity:

z <- list(
    stats=cbind(MORSE1$stats, MORSE2$stats, MORSE3$stats, M1$stats, M2$stats, M3$stats),
    n=c(MORSE1$stats, MORSE2$n, MORSE3$n, M1$n, M2$n, M3$n)
)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.