0

I have a dataframe with multiple columns. I have reduced the data frame size to illustrate my ask.

One column 'A' has a complete set of 6 values. The remaining 5 columns 'v1' to 'v5' randomly have 2 missing values each labelled NA.

df <- data.frame('A' = c(2, 4, 7, 5, 3, 4), 'v1' = c(3, NA, NA, 4, 5, 5),
                 'v2' = c(NA, NA, 6, 4, 5, 5), 'v3' = c(3, 4, NA, NA, 5, 5),
                 'v4' = c(3, 4, 6, 4, NA, NA), 'v5' = c(3, 4, 6, NA, NA, 5))
  A   v1   v2   v3   v4   v5
1 2 3.00 1.75 3.00 3.00 3.00
2 4 3.55 3.55 4.00 4.00 4.00
3 7 6.25 6.00 6.25 6.00 6.00
4 5 4.00 4.00 4.45 4.00 4.45
5 3 5.00 5.00 5.00 2.65 2.65
6 4 5.00 5.00 5.00 3.55 5.00

What I would like to do is fill in all NAs in the dataframe using an equation: -0.05 + 0.9*x . Where x corresponds to the value in Column A in the same row. For example:

For v1 row 2 where there is the first NA, Col A = 4. So I would like this NA to be filled as follows:

-0.05 + 0.9*4 = 3.55 ------- Filled with 3.55

And for v1 row 3 NA, where Col A = 7. I would like -0.05 + 0.9*7 = 6.25 ------ to be filled with 6.25

I was trying to utilise the ifelse() function, but do not know how to apply it to the whole dataframe and linking it to an equation that uses a value from another column in the same row.

My attempt is below, which I know is wrong but gives an idea of my approach to it:

ifelse(df$v1:v5 == NA, -0.05 + 0.9*df$A, df$v1:v5)

1
  • Please use dput to give us real example data to work with rather than a screenshot of a data frame. Also, you should apply ifelse to each column of the data frame, perhaps using apply, and not to the whole data frame. Commented Apr 2, 2020 at 14:41

2 Answers 2

1

A dplyr (tidyverse) based solution:

library(dplyr)

my_df <- data.frame('A' = c(2, 4, 7, 5, 3, 4), 'v1' = c(3, NA, NA, 4, 5, 5),
                    'v2' = c(NA, NA, 6, 4, 5, 5), 'v3' = c(3, 4, NA, NA, 5, 5),
                    'v4' = c(3, 4, 6, 4, NA, NA), 'v5' = c(3, 4, 6, NA, NA, 5))

my_df %>% mutate_at(vars(-A), ~ifelse(is.na(.), -0.05 + 0.9 * A, .))

Result:

  A   v1   v2   v3   v4   v5
1 2 3.00 1.75 3.00 3.00 3.00
2 4 3.55 3.55 4.00 4.00 4.00
3 7 6.25 6.00 6.25 6.00 6.00
4 5 4.00 4.00 4.45 4.00 4.45
5 3 5.00 5.00 5.00 2.65 2.65
6 4 5.00 5.00 5.00 3.55 5.00
Sign up to request clarification or add additional context in comments.

Comments

1

Below is a loop-based and working, but not very elegant solution. Maybe you get other responses.

Indizes = which(is.na(df), arr.ind = TRUE)
for (i in 1:(dim(Indizes)[1])){
      df[Indizes[i, 1], Indizes[i, 2]] = -0.05 + 0.9*df[Indizes[i, 1], 1]
  }

Output:

  A   v1   v2   v3   v4   v5
1 2 3.00 1.75 3.00 3.00 3.00
2 4 3.55 3.55 4.00 4.00 4.00
3 7 6.25 6.00 6.25 6.00 6.00
4 5 4.00 4.00 4.45 4.00 4.45
5 3 5.00 5.00 5.00 2.65 2.65
6 4 5.00 5.00 5.00 3.55 5.00

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.