Apply ifelse() condition to multiple columns in dataframe for values of NA using a formula that uses a formula linking to another value from a column

Question

I have a dataframe with multiple columns. I have reduced the data frame size to illustrate my ask.

One column 'A' has a complete set of 6 values. The remaining 5 columns 'v1' to 'v5' randomly have 2 missing values each labelled NA.

df <- data.frame('A' = c(2, 4, 7, 5, 3, 4), 'v1' = c(3, NA, NA, 4, 5, 5),
                 'v2' = c(NA, NA, 6, 4, 5, 5), 'v3' = c(3, 4, NA, NA, 5, 5),
                 'v4' = c(3, 4, 6, 4, NA, NA), 'v5' = c(3, 4, 6, NA, NA, 5))

  A   v1   v2   v3   v4   v5
1 2 3.00 1.75 3.00 3.00 3.00
2 4 3.55 3.55 4.00 4.00 4.00
3 7 6.25 6.00 6.25 6.00 6.00
4 5 4.00 4.00 4.45 4.00 4.45
5 3 5.00 5.00 5.00 2.65 2.65
6 4 5.00 5.00 5.00 3.55 5.00

What I would like to do is fill in all NAs in the dataframe using an equation: -0.05 + 0.9*x . Where x corresponds to the value in Column A in the same row. For example:

For v1 row 2 where there is the first NA, Col A = 4. So I would like this NA to be filled as follows:

-0.05 + 0.9*4 = 3.55 ------- Filled with 3.55

And for v1 row 3 NA, where Col A = 7. I would like -0.05 + 0.9*7 = 6.25 ------ to be filled with 6.25

I was trying to utilise the ifelse() function, but do not know how to apply it to the whole dataframe and linking it to an equation that uses a value from another column in the same row.

My attempt is below, which I know is wrong but gives an idea of my approach to it:

ifelse(df$v1:v5 == NA, -0.05 + 0.9*df$A, df$v1:v5)

Please use dput to give us real example data to work with rather than a screenshot of a data frame. Also, you should apply ifelse to each column of the data frame, perhaps using apply, and not to the whole data frame. — user10191355
– user10191355, Commented Apr 2, 2020 at 14:41

oszkar · Accepted Answer · 2020-04-02 14:59:51Z

1

A dplyr (tidyverse) based solution:

library(dplyr)

my_df <- data.frame('A' = c(2, 4, 7, 5, 3, 4), 'v1' = c(3, NA, NA, 4, 5, 5),
                    'v2' = c(NA, NA, 6, 4, 5, 5), 'v3' = c(3, 4, NA, NA, 5, 5),
                    'v4' = c(3, 4, 6, 4, NA, NA), 'v5' = c(3, 4, 6, NA, NA, 5))

my_df %>% mutate_at(vars(-A), ~ifelse(is.na(.), -0.05 + 0.9 * A, .))

Result:

  A   v1   v2   v3   v4   v5
1 2 3.00 1.75 3.00 3.00 3.00
2 4 3.55 3.55 4.00 4.00 4.00
3 7 6.25 6.00 6.25 6.00 6.00
4 5 4.00 4.00 4.45 4.00 4.45
5 3 5.00 5.00 5.00 2.65 2.65
6 4 5.00 5.00 5.00 3.55 5.00

answered Apr 2, 2020 at 14:59

oszkar

1,0028 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Taufi · Accepted Answer · 2020-04-02 14:22:23Z

1

Below is a loop-based and working, but not very elegant solution. Maybe you get other responses.

Indizes = which(is.na(df), arr.ind = TRUE)
for (i in 1:(dim(Indizes)[1])){
      df[Indizes[i, 1], Indizes[i, 2]] = -0.05 + 0.9*df[Indizes[i, 1], 1]
  }

Output:

  A   v1   v2   v3   v4   v5
1 2 3.00 1.75 3.00 3.00 3.00
2 4 3.55 3.55 4.00 4.00 4.00
3 7 6.25 6.00 6.25 6.00 6.00
4 5 4.00 4.00 4.45 4.00 4.45
5 3 5.00 5.00 5.00 2.65 2.65
6 4 5.00 5.00 5.00 3.55 5.00

answered Apr 2, 2020 at 14:22

Taufi

1,5979 silver badges15 bronze badges

Collectives™ on Stack Overflow

Apply ifelse() condition to multiple columns in dataframe for values of NA using a formula that uses a formula linking to another value from a column

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related