1

I'm a newbie in R.

I have some values in a data frame in column 3 to 6, which I wanted to plot in a dot chart. Column 3 to 6 each representing a month, and the rows representing the day in month from 1 to 30. The number inside the data frame represents the temperature.

I want to make a plot where you have temperatures on the y-axis and month on the x-axis. You then have dots on the plot representing each temperature and a line going through, where you can follow the mean temperature from month to month.

However some of the temperatures are the same, so I wanted to add a certain very small value to one of them, so that you can see lots of dots at the most common temperature.

I've tried:

boxplot(dat3[,3:6],dat3=mean, geom="point", shape=18,
        size=3, color="red")

However that doesn't make a line between the averages and plots the temperatures as a bar chart. I want only dots and a line between the averages.

Is that at all possible?

Thank you all.

2
  • Can you please provide us with a small dataset? stackoverflow.com/help/mcve Commented Nov 23, 2015 at 23:28
  • For quickly plotting data frames, I would suggest to look to ggplot2. It includes functionality for plotting scatter plots, line plots, and combined ones, as well as functionality to add jitter and calculate means. Commented Nov 24, 2015 at 0:12

2 Answers 2

1

I made up a tiny (and unreal) data frame but you can incorporate your own data.

structure(list(Month = structure(1:4, .Label = c("April", "May", 
"June", "July"), class = "factor"), X1 = c(50, 55, 57, 68), X2 = c(60, 
66, 68.4, 81.6), X3 = c(65, 71.5, 74.1, 88.4), X4 = c(40, 44, 
45.6, 54.4), X5 = c(50, 55, 57, 68), X6 = c(60, 66, 68.4, 81.6
), X7 = c(65, 71.5, 74.1, 88.4), X8 = c(40, 44, 45.6, 54.4), 
    X9 = c(50, 55, 57, 68), X10 = c(60, 66, 68.4, 81.6), X11 = c(65, 
    71.5, 74.1, 88.4), X12 = c(40, 44, 45.6, 54.4), X13 = c(50, 
    55, 57, 68), X14 = c(60, 66, 68.4, 81.6), X15 = c(65, 71.5, 
    74.1, 88.4), X16 = c(40, 44, 45.6, 54.4), X17 = c(50, 55, 
    57, 68), X18 = c(60, 66, 68.4, 81.6), X19 = c(65, 71.5, 74.1, 
    88.4), X20 = c(40, 44, 45.6, 54.4), X21 = c(50, 55, 57, 68
    ), X22 = c(60, 66, 68.4, 81.6), X23 = c(65, 71.5, 74.1, 88.4
    ), X24 = c(40, 44, 45.6, 54.4), X25 = c(50, 55, 57, 68), 
    X26 = c(60, 66, 68.4, 81.6), X27 = c(65, 71.5, 74.1, 88.4
    ), X28 = c(40, 44, 45.6, 54.4), X29 = c(50, 55, 57, 68), 
    X30 = c(50, 55, 57, 68)), .Names = c("Month", "X1", "X2", 
"X3", "X4", "X5", "X6", "X7", "X8", "X9", "X10", "X11", "X12", 
"X13", "X14", "X15", "X16", "X17", "X18", "X19", "X20", "X21", 
"X22", "X23", "X24", "X25", "X26", "X27", "X28", "X29", "X30"
), row.names = c(NA, -4L), class = "data.frame")

After some clean up work, there are several ways to plot your data, but here is one:

library(dplyr)
df$Month <- factor(df$Month, levels = c("April", "May", "June", "July"))    # changed the order from alphabetical
df.m <- melt(df, id.vars = "Month")                        # melted the data frame into long format
df.m$variable <- str_replace_all(string = df.m$variable, pattern = "X", replacement = "")   # remove the X before dates

avg.temp <- df.m %>% group_by(Month) %>% summarise(avg = mean(value))       # calculated the monthly mean for plotting

library(ggplot2)
ggplot(df.m, aes(x = factor(variable), y = value)) +
  geom_point() +
  geom_point(data = avg.temp, aes(x = 15, y = avg), size = 7, color = "red") +
  facet_wrap(~Month) +
  theme_bw() +
  labs(x = "Days of the Month", y = "Temperature (F)", title = "Distribution of Temperatures -- Monthly Mean in Red")

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Wow cool and impressive answer! Is it possible to have like a dot chart, where month is on the x-axis, and the different temperatures on the y-axis. And where all days only belong to one month-coloumn, so that I can make a line between the averages.
Yes, that is certainly possible. Why don't you take my code and your data and work with it to that end? SO isn't a coding service; we try to answer specific coding problems people submit. And, by the way, if this answers your question, even if not the further refinement you wish, consider accepting it by clicking on the accept arrow.
0

A solution using ggplot2 (for plotting), tidyr (for converting your table into an easier to process data frame), and dplyr (for working with the data frame):

df <- structure(list(Jan = c(50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L,
50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L,
60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 50L), Feb = c(50L, 60L,
65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L,
40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L,
50L, 50L), Mar = c(50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L,
60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L,
65L, 40L, 50L, 60L, 65L, 40L, 50L, 50L), Apr = c(50L, 60L, 65L,
40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L,
50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L,
50L), May = c(50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L,
65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L, 40L, 50L, 60L, 65L,
40L, 50L, 60L, 65L, 40L, 50L, 50L), Jun = c(55L, 66L, 71L, 44L,
55L, 66L, 71L, 44L, 55L, 66L, 71L, 44L, 55L, 66L, 71L, 44L, 55L,
66L, 71L, 44L, 55L, 66L, 71L, 44L, 55L, 66L, 71L, 44L, 55L, 55L
), Jul = c(57L, 68L, 74L, 45L, 57L, 68L, 74L, 45L, 57L, 68L,
74L, 45L, 57L, 68L, 74L, 45L, 57L, 68L, 74L, 45L, 57L, 68L, 74L,
45L, 57L, 68L, 74L, 45L, 57L, 57L), Aug = c(68L, 81L, 88L, 54L,
68L, 81L, 88L, 54L, 68L, 81L, 88L, 54L, 68L, 81L, 88L, 54L, 68L,
81L, 88L, 54L, 68L, 81L, 88L, 54L, 68L, 81L, 88L, 54L, 68L, 68L
)), .Names = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
"Aug"), class = "data.frame", row.names = c(NA, -30L))

library(ggplot2)
library(tidyr)
library(dplyr)

df.temps <- df %>% select(Mar:Jun) %>% gather(month, temperature)
df.avg <- df.temps %>% group_by(month) %>% summarise(average=mean(temperature))

ggplot() +
  geom_point(data=df.temps, aes(x=temperature, y=month), position=position_jitter(width=1, height=0)) +
  geom_point(data=df.avg, aes(x=average, y=month), color="red", size=3) +
  geom_line(data=df.avg, aes(x=average, y=month, group=NA)) +
  labs(x = "Temperature (in F)", y = "Month")

output

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.