0

There are some values in my dataset(df) that needs to be replaced with correct values e.g.,

Height Disease Weight>90kg
1.58 1 0
1.64 0 1
1.67 1 0
52 0 1
67 0 0

I want to replace the first three values as '158', '164' & '167'. I want to replace the next as 152 and 167 (adding 1 at the beginning).

I tried the following code but it doesn't work:

data_clean <- function(df) {
df[height==1.58] <- 158
df}
data_clean(df)

Please help!

2 Answers 2

1

Using recode you can explicitly recode the values:

df <- mutate(df, height = recode(height, 
                                 1.58 = 158, 
                                 1.64 = 164, 
                                 1.67 = 167, 
                                 52 = 152, 
                                 67 = 167))

However, this obviously is a manual process and not ideal for a case with many values that need recoding.

Alternatively, you could do something like:

df <- mutate(df, height = case_when(
  height < 2.5 ~ height * 100,
  height < 100 ~ height + 100
)

This would really depend on the makeup of your data but for the example given it would work. Just be careful with what your assumptions are. Could also have used is.double and 'is.integer`.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your help @geoff. I tried using recode but it returns an error saying, unexpected '=' in & unexpected ',' in. I have tidyverse package loaded.
Probably expects factors only. Bit more verbose but essentially the same as recode is mutate(df, height = case_when(height == 1.58 ~ 158, height == 1.64 ~ 164, ...
Thanks @geoff. This one worked but it changes all the other correct values in height to 'NA'. Height is <dbl>. Can you suggest a solution to this issue?
Yep, when using case_when you need to explicitly name all conditions otherwise the other values will be NA. The typical way to do this is add TRUE ~ height do the end of your case_when. In other words, if you reach this condition, simply return the actual value.
0

You also could vectorize a switch:

foo <- Vectorize(FUN = function(x) {
                   switch(as.character(x),
                          "1.58" = 158,
                          "1.64" = 164,
                          "1.67" = 167,
                          "52" = 152,
                          "67" = 167,
                          NA)})

Then just replace as follows:

df$Height <- foo(df$Height)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.