1

Suppose we have the following data.table:

x_dt <- data.table(sexn = c(1, 0, 0, 1, NA, 1, NA), 
                   country = c("CHN", "JPN", "BGR",  "AUT", " ", "TWN", " "), 
                   age = c(35, NA, 40, NA, 70, 18, 36)
)

I am trying to create a variable asia_region, which has a value of 1 when country %chin% c("CHN", "JPN", "KOR", "SGP", "TWN"), a value of 0 when country is not missing and NA when country is missing.

The following code populates 0's when country is missing.

result <- x_dt[, asia_region := ifelse(country %chin% c("CHN", "JPN", "KOR",  "SGP", "TWN"),1 , 0)]

2 Answers 2

1

We can directly coerce the logical to binary with as.integer or +, then change the values to NA where the 'country' is blank ("") by specifying a logical condition in i and assignment (:=) for those corresponding elements in 'asia_region' to NA

x_dt[,  asia_region := +(country %chin% c("CHN", "JPN", "KOR", "SGP", "TWN"))]
x_dt[trimws(country) == "", asia_region := NA_integer_]

-output

> x_dt
   sexn country age asia_region
1:    1     CHN  35           1
2:    0     JPN  NA           1
3:    0     BGR  40           0
4:    1     AUT  NA           0
5:   NA          70          NA
6:    1     TWN  18           1
7:   NA          36          NA

Or if we need a ifelse/fifelse (if/else wouldn't work as it is not vectorized i.e. it expects input expression with length 1 and not more than that)

x_dt[, asia_region := fifelse(trimws(country) == "", NA_integer_,
        fifelse(country %chin% c("CHN", "JPN", "KOR", "SGP", "TWN"), 1, 0))]
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks. This works. Is there a more intuitive solution using "if", "else if", "else" statements?
I suggest that akrun's first solution is more canonical and efficient from a data.table point of view. if is certainly not right, ifelse/fifelse are certainly good alternatives. (I suggest you accept this answer as-is, it is unlikely a better solution can/will be offered, imo.)
The second solution, which uses fifelse does not require chaining expressions. Doesn't this make it more efficient than the first one?
@MKro No, it wouldn't. I think the first one is more direct as it is just a coersion
0

How about a dplyr() solution? I'd make a vector of the countries, for easier referencing:

asia_countries <-  c("CHN", "JPN", "KOR",  "SGP", "TWN")

x_dt |>
  dplyr::mutate(asia_region = ifelse(country %in% asia_countries, 1, 0)) |>
  dplyr::mutate(asia_region = ifelse(country == " ", NA, asia_region))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.