3

So essentially I am relatively new to R and my weak spot is writing as little code as possible. I always run into the same problem and I just can't seem to solve it with a loop or a function, so I'd love some help.

Let's say my df looks like this:

a = c(12, 9, 11, 17, 22)
b = c(8, 1, 9, 4, 15)
c = c(2, 4, 1, 8, 4)
d = c(2, 4, 1, 5, 3)

df = data.frame(a, b, c, d)

I want to calcucate the proportion of b, c and d of a and I want a new column for each of the outcomes. My code without functions etc looks like this:

df$c_p = round((df$c / df$a)*100, digits = 2)
df$d_p = round((df$d / df$a)*100, digits = 2)

What's the easiest way to get the same output I do without having to copypaste the code over and over again? My dataframe is much bigger in reality and it's time for me to learn how to do this more efficiently.

Thank you!

2
  • Can't find athere... Commented Sep 16, 2021 at 6:33
  • oh it seems that was cut off! I'll edit it in there, sorry Commented Sep 16, 2021 at 6:50

2 Answers 2

3

You can take advantage of R's vectorization.

cols <- names(df)[-1]
#OR
#cols <- c('b', 'c', 'd')
df[paste0(cols, '_p')] <- round(df[cols]/df$a * 100, 2)
df

#   a  b c d    b_p   c_p   d_p
#1 10  8 2 2  80.00  20.0  20.0
#2 23  1 4 4   4.35  17.4  17.4
#3 50  9 1 1  18.00   2.0   2.0
#4  7  4 8 5  57.14 114.3  71.4
#5  3 15 4 3 500.00 133.3 100.0
Sign up to request clarification or add additional context in comments.

1 Comment

That works wonderfully, thank you so much!
3

An alternative (and elegant) solution is based on dplyr:

library(dplyr)
df %>%
  mutate(across(b:d), ./a*100) %>%
  select(-a)
         b         c         d
1 66.66667 16.666667 16.666667
2 11.11111 44.444444 44.444444
3 81.81818  9.090909  9.090909
4 23.52941 47.058824 29.411765
5 68.18182 18.181818 13.636364

or , with rounding:

df %>%
  mutate(across(b:d), round(./a*100, 2)) %>%
  select(-a)

EDIT:

To keep the original columns, use cbind:

df %>%
  mutate(across(b:d), round(./a*100, 2)) %>%
  rename(b_p = b, c_p = c, d_p = d) %>%
  select(-a) %>%
  cbind(df, .)
   a  b c d   b_p   c_p   d_p
1 12  8 2 2 66.67 16.67 16.67
2  9  1 4 4 11.11 44.44 44.44
3 11  9 1 1 81.82  9.09  9.09
4 17  4 8 5 23.53 47.06 29.41
5 22 15 4 3 68.18 18.18 13.64

2 Comments

That's interesting. It doesn't add them as new columns as I would have liked but it's good to know that possibility exists. The dot is a placeholder for the variables then? So it calculates with all the variables that are inside of the across statement?
Correct. For the issue of keeping the original cols, please see edited answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.