1

This is something rather basic but I just can't find existing answer (or don't know how to find it)

In the R language, I'm reading a column indicating an integer (in this case waiting time in minutes), but for zero minutes the string "No Delay" appears instead of an integer. What's the best way to deal with this in order to proceed?

2
  • 1
    converting the column to integer using as.integer will make "No Delay" appear as NA. then you can overwrite those with NA as 0. Or you can replace all "No Delay" with 0 before using as.integer Commented Feb 20, 2018 at 1:58
  • Before posting here I worked around this problem by a pre-process with an editor like vi to replace all those strings, as a non-R solution. With R, since these columns also present the string "Missing value" as NA, which I forgot to mention, I'd need to go for a more explicit conversion approach, which I learned it in this thread just now. Thank you. Commented Feb 20, 2018 at 6:51

1 Answer 1

1

If you just uses read.csv and convert that column to an integer, the "No Delay" values will become NAs. You can then convert them to 0s.

df <- read.csv("thefile.csv")
df$Time <- as.integer(df$Time)
df$Time[is.na(df$Time)] <- 0

OR, you can convert all "No Delay"s to "0", then convert to integer.

df <- read.csv("thefiles.csv")
df$Time[df$Time == "No Delay"] <- "0"
df$Time <- as.integer(df$Time)
Sign up to request clarification or add additional context in comments.

4 Comments

I'd need your 2nd approach since the column also presents the strings "Missing value" as NA. My key learning was the syntax/technique of df$Time[<boolean filter>] <- 0 which I did know or thought of before. Thanks!
Awesome, glad this helped :) It's a very useful technique which essentially returns all those matching boolean conditions as a pointer which you can fill :)
In executing the 2nd line of code df$Time[df$Time=="No delay"] <- "0", I got this warning message and instead of assigning 0, it turns them into NA. Do you know why? Warning message: In [<-.factor(*tmp*, df$Travellers.Flow == "No delay", value = c(NA, : invalid factor level, NA generated
Ah, if that column is of type 'factor' then it can't assign '0' as it is not a valid level. There's a couple of ways to do this. You could have it stored as strings instead of factors. This can be done by df$Time <- as.character(df$Time) or in the read.csv function using the argument stringsAsFactor = FALSE. The other option is to add 0 as a factor level, df$Time <- factor(df$Time, levels = c(levels(df$Time), "0")). Both are fine, it just depends if you'd rather have them as characters or factors.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.