0

I am doing a Bayesian analysis. I have monthly posterior means of sales data for 100 stores. I want to create a boxplot using ggplot which visualizes this data. The problem I am facing is I'm not sure how to let ggplot know that there is 100 columns and 12 rows.

Here is a sample of the data. Here is a link to the data It is very small, 13 rows of months and 100 columns representing 100 stores. The 13th row is the means.

> head(t.data)
               1          2         3         4          5           6          7          8          9        10
Month1 -26.25917   1.740833 151.74083 67.740833  21.740833  -42.259167  -1.259167  49.740833  -53.25917 -98.25917
Month2 -27.25917  -9.259167 117.74083 74.740833  -2.259167    5.740833  64.740833  59.740833  -41.25917 -59.25917
Month3 -32.25917 -68.259167  42.74083 79.740833 -61.259167 -113.259167  26.740833 -36.259167  -81.25917 -32.25917
Month4 -37.25917  -9.259167 128.74083 -6.259167 -14.259167  -38.259167  69.740833  28.740833  -46.25917 -16.25917
Month5  15.74083   5.740833 102.74083 89.740833 -57.259167  -60.259167 112.740833 -28.259167  -42.25917 -41.25917
Month6 -22.25917  -9.259167  82.74083 11.740833 -42.259167  -56.259167  42.740833   4.740833 -101.25917 -58.25917
               11        12        13         14        15        16         17        18          19         20
Month1   47.74083 160.74083 -63.25917   8.740833  58.74083  20.74083  64.740833 -13.25917 -35.2591667  -78.25917
Month2   38.74083 153.74083 -36.25917 -57.259167  46.74083 -71.25917  22.740833  23.74083 -29.2591667 -108.25917
Month3   37.74083  77.74083 -29.25917 -49.259167 127.74083 -63.25917 -18.259167 -34.25917 -98.2591667 -143.25917
Month4   49.74083 189.74083 -46.25917 -54.259167  97.74083 -27.25917  55.740833 -43.25917 -54.2591667  -82.25917
Month5  -14.25917  78.74083  18.74083 -16.259167  47.74083 -13.25917  19.740833 -22.25917   0.7408333  -91.25917
Month6 -164.25917  83.74083  10.74083 -22.259167 -14.25917 -33.25917  -4.259167 -16.25917 -34.2591667 -105.25917

Essentially then, how would I replicate the boxplot function of base R using this type (timeseries) of data? Here is a reproducible example of the standard R boxplot. Here is a link to the q.csv data.

t.data <- read.csv("t.data.csv", header=TRUE)  
q <- read.csv("q.csv", header=TRUE)
colnames(t.data) <- paste("", 1:100, sep = "")  
boxplot(t.data,xlab="Store Number",ylab="Effect of Promo on Sales (centered)",outline=FALSE,
            main="Data versus posterior of the spaial random effect")

lines(q[1:100,1],col=2,lty=2)
lines(q[1:100,2],col=2,lty=1)
lines(q[1:100,3],col=2,lty=2)

legend("topright",c("Median","95% interval"),lty=1:2,col=2,bg=gray(1),inset=0.05)

1 Answer 1

1

We need to melt t.data first and then we can use ggplot2.

library(ggplot2)
ggplot(melt(t.data), aes(variable, value)) + 
  geom_boxplot()

enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

lines of what exactly?
The median and 95% interval lines, shown in the example as lines(q[1:100,1],col=2,lty=2) lines(q[1:100,2],col=2,lty=1) lines(q[1:100,3],col=2,lty=2)
my apologies, I have added a link to the q data and a line to load it.
Nevermind, I figured it out! Thank you for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.