1

I have the following data:

PassengerId Survived Pclass    Sex Age SibSp Parch    Fare Embarked
1           1        0      3   male  22     1     0  7.2500        S
2           2        1      1 female  38     1     0 71.2833        C
3           3        1      3 female  26     0     0  7.9250        S
4           4        1      1 female  35     1     0 53.1000        S
5           5        0      3   male  35     0     0  8.0500        S
6           6        0      3   male  NA     0     0  8.4583        Q

Now, when I use the dummy or dummy.data.frame, I can successfully convert factors (here Sex and Embarked)to dummies like this:

PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch    Fare Embarked EmbarkedC EmbarkedQ EmbarkedS
1           1        0      3         0       1  22     1     0  7.2500        0         0         0         1
2           2        1      1         1       0  38     1     0 71.2833        0         1         0         0
3           3        1      3         1       0  26     0     0  7.9250        0         0         0         1
4           4        1      1         1       0  35     1     0 53.1000        0         0         0         1
5           5        0      3         0       1  35     0     0  8.0500        0         0         0         1
6           6        0      3         0       1  NA     0     0  8.4583        0         0         1         0

Now, if how can I apply this on Age column where it's creating more than 100 dummies, one for each unique age entry and one for NA. I want the output to be like

Age   Age.NA
22    0 
38    0
......
35    0
0     1

It is automatically treating missing values as a different entry and creating a variable for it in case of factors, but I want to achieve the same in case of numeric variables without hampering already existing values in the column. Please help.

2 Answers 2

3

You can just use:

df$Age.NA <- ifelse(is.na(df$Age), 1, 0)

And then:

library(dummies)
dummy.data.frame(df)

Output:

  PassengerId Survived Pclass Sexfemale Sexmale Age SibSp Parch    Fare EmbarkedC EmbarkedQ EmbarkedS Age.NA
1           1        0      3         0       1  22     1     0  7.2500         0         0         1      0
2           2        1      1         1       0  38     1     0 71.2833         1         0         0      0
3           3        1      3         1       0  26     0     0  7.9250         0         0         1      0
4           4        1      1         1       0  35     1     0 53.1000         0         0         1      0
5           5        0      3         0       1  35     0     0  8.0500         0         0         1      0
6           6        0      3         0       1  NA     0     0  8.4583         0         1         0      1

Data:

df <- structure(list(PassengerId = 1:6, Survived = c(0L, 1L, 1L, 1L, 
0L, 0L), Pclass = c(3L, 1L, 3L, 1L, 3L, 3L), Sex = structure(c(2L, 
1L, 1L, 1L, 2L, 2L), .Label = c("female", "male"), class = "factor"), 
    Age = c(22L, 38L, 26L, 35L, 35L, NA), SibSp = c(1L, 1L, 0L, 
    1L, 0L, 0L), Parch = c(0L, 0L, 0L, 0L, 0L, 0L), Fare = c(7.25, 
    71.2833, 7.925, 53.1, 8.05, 8.4583), Embarked = structure(c(3L, 
    1L, 3L, 3L, 3L, 2L), .Label = c("C", "Q", "S"), class = "factor"), 
    Age.NA = c(0, 0, 0, 0, 0, 1)), .Names = c("PassengerId", 
"Survived", "Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", 
"Embarked", "Age.NA"), row.names = c("1", "2", "3", "4", "5", 
"6"), class = "data.frame")
Sign up to request clarification or add additional context in comments.

Comments

0

Use a ifelse() statement to check for NA:

Age.NA <- ifelse(is.na(Age), 1, 0)

2 Comments

Hi, basically I would like to create two columns instead of just one. I would like to replace the NA values of the original Age column with 0. And create a separate column with 0 and 1 based on there are missing values or not. Something that dummy does.
Do it just the same way: Age <- ifelse(is.na(Age),0,Age)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.