1

I created a following data.table objects;

        V1         V2      V3       V4
     1: 693 -0.2842529  1.3710 21.64843
     2: 240 -2.6564554 -0.5647 93.37038
     3:  43 -2.4404669  0.3631 92.63883
     4: 140  1.3201133  0.6329 73.67534
     5: 216 -0.3066386  1.3710 33.97413
     6: 479 -1.7813084 -0.5647 51.99127
     7: 197 -0.1719174  0.3631 74.65349
     8: 720  1.2146747  0.6329 62.29676
     9:   7  1.8951935  1.3710 62.99829
    10: 375 -0.4304691 -0.5647 22.49861
    11: 514 -0.2572694  0.3631 22.44016
    12:   1 -1.7631631  0.6329 39.50556

I wanted to generate the new categorical/group column based on the value of columns V1-V4. For example, I used the value in V1 to generate the categorical V5 column as follows,

DT[V1>0.1, V5 :="A"]
DT[V1>10, V5 :="B"]

Then, I get this table;

        V1         V2      V3       V4 V5
    1: 693 -0.2842529  1.3710 21.64843  B
    2: 240 -2.6564554 -0.5647 93.37038  B
    3:  43 -2.4404669  0.3631 92.63883  B
    4: 140  1.3201133  0.6329 73.67534  B
    5: 216 -0.3066386  1.3710 33.97413  B
    6: 479 -1.7813084 -0.5647 51.99127  B
    7: 197 -0.1719174  0.3631 74.65349  B
    8: 720  1.2146747  0.6329 62.29676  B
    9:   7  1.8951935  1.3710 62.99829  A
    10: 375 -0.4304691 -0.5647 22.49861  B
    11: 514 -0.2572694  0.3631 22.44016  B
    12:   1 -1.7631631  0.6329 39.50556  A

Is it possible to combine two lines above into one? Is it possible to combine this from values in multiple other columns (e.g. V2-V4) ?

0

2 Answers 2

2

If we have many levels, it may be better to use cut

DT[, V5 := as.character(cut(V1, breaks = c(0.1, 10, Inf), labels = c("A", "B")))]
DT$V5
#[1] "B" "B" "B" "B" "B" "B" "B" "B" "A" "B" "B" "A"

Or with findInterval

DT[, V5 := LETTERS[findInterval(V1, c(0.1, 10))]]
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. The use of findInterval works great with one variable from the single column. Is it possible to include variables from other columns (e.b. V2 <1, V3>0....etc) ?
@akh22 Yes, you can do DT[, c("V2, "V3", "V5") := lapply(.SD, function(x) LETTERS[x, c(0.1, 10))]), .SDcols = c("V2", "V3", "V5")]
1

Whether you prefer a tidyverse approach, you may run the following code:

library(tidyverse)           
new_data <- your_data %>% 
    mutate(V5=case_when(
        V1>=0.1 & V1<10 ~ "A",
        V1>=10 ~ "B"
    ))

1 Comment

Installation went pretty smooth but I had a trouble loading it. Other than that this approach works well, particularly having multiple variables, particularly from other columns (e.g. V1>10 & V2 >1 & V3<0....etc). Really thanks!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.