creating/updating a column based on rows from multiple columns

Question

I created a following data.table objects;

        V1         V2      V3       V4
     1: 693 -0.2842529  1.3710 21.64843
     2: 240 -2.6564554 -0.5647 93.37038
     3:  43 -2.4404669  0.3631 92.63883
     4: 140  1.3201133  0.6329 73.67534
     5: 216 -0.3066386  1.3710 33.97413
     6: 479 -1.7813084 -0.5647 51.99127
     7: 197 -0.1719174  0.3631 74.65349
     8: 720  1.2146747  0.6329 62.29676
     9:   7  1.8951935  1.3710 62.99829
    10: 375 -0.4304691 -0.5647 22.49861
    11: 514 -0.2572694  0.3631 22.44016
    12:   1 -1.7631631  0.6329 39.50556

I wanted to generate the new categorical/group column based on the value of columns V1-V4. For example, I used the value in V1 to generate the categorical V5 column as follows,

DT[V1>0.1, V5 :="A"]
DT[V1>10, V5 :="B"]

Then, I get this table;

        V1         V2      V3       V4 V5
    1: 693 -0.2842529  1.3710 21.64843  B
    2: 240 -2.6564554 -0.5647 93.37038  B
    3:  43 -2.4404669  0.3631 92.63883  B
    4: 140  1.3201133  0.6329 73.67534  B
    5: 216 -0.3066386  1.3710 33.97413  B
    6: 479 -1.7813084 -0.5647 51.99127  B
    7: 197 -0.1719174  0.3631 74.65349  B
    8: 720  1.2146747  0.6329 62.29676  B
    9:   7  1.8951935  1.3710 62.99829  A
    10: 375 -0.4304691 -0.5647 22.49861  B
    11: 514 -0.2572694  0.3631 22.44016  B
    12:   1 -1.7631631  0.6329 39.50556  A

Is it possible to combine two lines above into one? Is it possible to combine this from values in multiple other columns (e.g. V2-V4) ?

akrun · Accepted Answer · 2017-12-26 07:08:23Z

2

If we have many levels, it may be better to use cut

DT[, V5 := as.character(cut(V1, breaks = c(0.1, 10, Inf), labels = c("A", "B")))]
DT$V5
#[1] "B" "B" "B" "B" "B" "B" "B" "B" "A" "B" "B" "A"

Or with findInterval

DT[, V5 := LETTERS[findInterval(V1, c(0.1, 10))]]

edited Dec 26, 2017 at 7:08

answered Dec 26, 2017 at 7:03

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

akh22 Over a year ago

Thanks. The use of findInterval works great with one variable from the single column. Is it possible to include variables from other columns (e.b. V2 <1, V3>0....etc) ?

akrun Over a year ago

@akh22 Yes, you can do DT[, c("V2, "V3", "V5") := lapply(.SD, function(x) LETTERS[x, c(0.1, 10))]), .SDcols = c("V2", "V3", "V5")]

Scipione Sarlo · Accepted Answer · 2017-12-26 17:09:42Z

1

Whether you prefer a tidyverse approach, you may run the following code:

library(tidyverse)           
new_data <- your_data %>% 
    mutate(V5=case_when(
        V1>=0.1 & V1<10 ~ "A",
        V1>=10 ~ "B"
    ))

answered Dec 26, 2017 at 17:09

Scipione Sarlo

1,5081 gold badge20 silver badges33 bronze badges

1 Comment

akh22 Over a year ago

Installation went pretty smooth but I had a trouble loading it. Other than that this approach works well, particularly having multiple variables, particularly from other columns (e.g. V1>10 & V2 >1 & V3<0....etc). Really thanks!!

Collectives™ on Stack Overflow

creating/updating a column based on rows from multiple columns

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related