1

I have three columns in a dataframe: age, gender and income.

I want to loop through these columns and create plots based on the data in them.

I know in stata you can loop through variables and then run commands with those variables. However the code below does not seem to work, is there an equivalent way to do what I want to do in R?

groups <- c(df$age, df$gender, df$income) 
for (i in groups){   
df %>% group_by(i) %>%
    summarise(n = n()) %>%
    mutate(prop = n/sum(n)) %>%
    ggplot(aes(y = prop, x = i)) +
    geom_col()  
}

2 Answers 2

4

you can also use the tidyverse. Loop through a vector of grouping variable names with map. On every iteration, you can evaluate !!sym(variable) the variable name to group_by. Alternatively, we can use across(all_of()), wihch can take strings directly as column names. The rest of the code is pretty much the same you used.

library(dplyr)
library(purrr)

groups <- c('age', 'gender', 'income') 

## with    !!(sym(.x))

map(groups, ~ 
    df %>% group_by(!!sym(.x)) %>%
    summarise(n = n()) %>%
    mutate(prop = n/sum(n)) %>%
    ggplot(aes(y = prop, x = i)) +
    geom_col()
   )

## with    across(all_of())

map(groups, ~ 
    df %>% group_by(across(all_of(.x))) %>%
    summarise(n = n()) %>%
    mutate(prop = n/sum(n)) %>%
    ggplot(aes(y = prop, x = i)) +
    geom_col()
   )

If you want to use a for loop:

groups <- c('age', 'gender', 'income')

for (i in groups){   
df %>% group_by(!!sym(i)) %>%
    summarise(n = n()) %>%
    mutate(prop = n/sum(n)) %>%
    ggplot(aes(y = prop, x = i)) +
    geom_col()  
}
Sign up to request clarification or add additional context in comments.

8 Comments

The tilde is the short form for lambda/anonymous functions in purrr and other tydiverse packages like dplyr. It makes writing lambda functions a bit easier. function(x) x can then be simplyfied into ~ .x. In the above answer, the tilde could be replaced with function (.x)
The double bang sin !! is from the rlang package (and used in dplyr too). The "groups" variable is a string. when you convert it to symbol with sym() it will then becames a symbol (a name-object correspondence key,). Then the double bang sign (!!) evaluates the symbol, which gets us the desired column. If you use the column name directly as a string it wont work because it is not yet associated to any object in the dataframe. group_by() does not accept column names directly. You have to convert to symbol and evaluate.
This can be done with other dplyr functions too, like select() and mutate() or across(), with !!(sym(x)) or with specific helper functions that take strings directly as column names, such as all_of() or if_all
I included a version of the answer with across(all_of)) . It is clearer for begginers than !!(sym(.x)) and is pretty much the same for this.
In that case you are looking for the actual value of the object .x as a string, so you don't have to convert to symbol, just use `ggsave( paste0("results/JS05 - Survey/", .x, ".png")
|
2

You can use lapply

df <- data.frame(age = sample(c("26-30", "31-35", "36-40", "41-45"), 20, replace = T),
                 gender = sample(c("M", "F"), 20, replace = T),
                 income = sample(c("High", "Medium", "Low"), 20, replace = T),
                 prop   = runif(20))

lapply(df[,c(1:3)], function(x) ggplot(data = df, aes(y = df$prop, x = x))+ geom_col())

2 Comments

Thanks so much, if I don't know the column numbers, can I reference them by name in the lapply?
@entropy Yes you can.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.