1

I am trying to convert missing factor values to NA in a data frame, and create a new data frame with replaced values but when I try to do that, previously character factors are all converted to numbers. I cannot figure out what I am doing wrong and cannot find a similar question. Could anybody please help?

Here are my codes:

orders <- c('One','Two','Three', '')
ids <- c(1, 2, 3, 4)
values <- c(1.5, 100.6, 19.3, '')

df <- data.frame(orders, ids, values)
new.df <- as.data.frame(matrix( , ncol = ncol(df), nrow = 0))
names(new.df) <- names(df)

for(i in 1:nrow(df)){
    row.df <- df[i, ]
    print(row.df$orders) # "One", "Two", "Three", ""
    print(str(row.df$orders)) # Factor
    # Want to replace "orders" value in each row with NA if it is missing 
    row.df$orders <- ifelse(row.df$orders == "", NA, row.df$orders)
    print(row.df$orders) # Converted to number
    print(str(row.df$orders)) # int or logi
    # Add the row with new value to the new data frame
    new.df[nrow(new.df) + 1, ] <- row.df
    }

and I get this:

> new.df
  orders ids values
1      2   1      2
2      4   2      3
3      3   3      4
4     NA   4      1

but I want this:

> new.df
  orders ids values
1    One   1    1.5
2    Two   2  100.6
3  Three   3   19.3
4     NA   4       
1
  • Sorry that was a mistake. I corrected it. Commented Jun 16, 2020 at 5:46

2 Answers 2

1

Convert empty values to NA and use type.convert to change their class.

df[df == ''] <- NA
df <- type.convert(df)
df
#  orders ids values
#1    One   1    1.5
#2    Two   2  100.6
#3  Three   3   19.3
#4   <NA>   4     NA

str(df)
#'data.frame':  4 obs. of  3 variables:
#$ orders: Factor w/ 4 levels "","One","Three",..: 2 4 3 1
#$ ids   : int  1 2 3 4
#$ values: num  1.5 100.6 19.3 NA
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks and sorry for not being clear but I want "orders" to be still a factor after replacing missing with NA. Do you know why these factors are converted to 2, 3, and 4? Where do they come from?
@owl factors are internally represented as numbers hence, you see those numbers. If you want to keep orders as factors you can only use df <- type.convert(df)
Thank you! That would give me what I wanted. I did not realize until now that factors are internally represented as numbers.
A column can only have one class. empty value ('') is a character and not a number. So if you put an empty value in values it will turn complete column to character.
If you want to replace the empty value with NA for only one column you can do df$values[is.na(df$values)] <- NA
|
0

Thanks to the hint from Ronak Shah, I did this and it gave me what I wanted.

df$orders[df$orders == ''] <- NA

This will give me:

> df
  orders ids values
1    One   1    1.5
2    Two   2  100.6
3  Three   3   19.3
4   <NA>   4       

> str(df)
'data.frame':   4 obs. of  3 variables:
 $ orders: Factor w/ 4 levels "","One","Three",..: 2 4 3 NA
 $ ids   : num  1 2 3 4
 $ values: Factor w/ 4 levels "","1.5","100.6",..: 2 3 4 1

In case you are curious about the difference between NA and as I was, you can find the answer here.

Your suggestion

df$orders[is.na(df$orders)] <- NA

did not work maybe becasuse missing entry is not NA?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.