0

Provided the following dataframe (see below) which was taken out of a questionnaire asking about perceived security to people from different neighborhoods, I have managed to create a bar plot which displays perceived security and groups results per each neighborhood:

questionnaire_raw = read.csv("https://www.dropbox.com/s/l647q2omffnwyrg/local.data.csv?dl=0")

ggplot(data = questionnaire_raw, 
       aes(x = factor(Seguridad.de.tu.barrio..de.día.), # We have to convert x values to categorical data
           y = (..count..)/sum(..count..)*100,
           fill = neighborhoods)) + 
  geom_bar(position="dodge") + 
  ggtitle("Seguridad de día") + 
  labs(x="Grado de seguridad", y="% encuestados", fill="Barrios")

enter image description here

I would like to overlay these results with a line graph representing the mean of each security category (1, 2, 3 or 4) in all neighborhoods (this is, without grouping results), so it is easy to know if a specific neighborhood is over or under the average of all neighborhoods. However, since it's my first job with R, I do not know how to calculate that mean with a dataframe and then overlay it in the previous barplot.

2
  • What about adding something like + stat_summary(fun.data="mean_cl_normal", geom = "line", mapping = aes(group = 1)) (untested)? Commented Feb 12, 2015 at 11:56
  • results in Error: stat_summary requires the following missing aesthetics: y Commented Feb 12, 2015 at 12:00

1 Answer 1

4

using data.table for data-manipulation and lukeA's comment:

require(ggplot2)
require(data.table)
setDT(questionnaire_raw)
setnames(questionnaire_raw, c("Timestamp", "Barrios", "Grado"))

plot_data <- questionnaire_raw[,.N, by=.(Barrios,Grado)]
ggplot(plot_data, aes(x=factor(Grado), y = N, fill = Barrios)) +
  geom_bar(position="dodge", stat="identity") +
  stat_summary(fun.y=mean, geom = "line", mapping = aes(group = 1)) +
  ggtitle("Seguridad de día") + 
  labs(x="Grado de seguridad", y="% encuestados", fill="Barrios")

Result: enter image description here

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you very much for your answer. It's working fine, although I have to understand what are you doing because since the original dataframe is far bigger (we have 72 variables, not 3) it seems that I can't reproduce the setnames line. I think I need to create a vector with all 72 variables, but since I have never heard about that function I am not sure. I will try creating a new dataframe with just the variables I need.
The 'setnames' line just Alters the Column names of the Data. Have a Look at the Data before and after. It is not difficult.
I am re-reading your code, and honestly (and shamely) I do not understand almost anything you do on it. I still have to learn a lot about R...
And the line with by counts the occurrences

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.