R Data.Table: Dynamically Update a Different Column for each Row

Question

I'm working on some code where I need to find the maximum value over a set of columns and then update that maximum value. Consider this toy example:

test <- data.table(thing1=c('AAA','BBB','CCC','DDD','EEE'),
                        A=c(9,5,4,2,5),
                        B=c(2,7,2,6,3),
                        C=c(6,2,5,4,1),
                      ttl=c(1,1,3,2,1))

where the resulting data.table looks like this:

thing1	A	B	C	ttl
AAA	9	2	6	1
BBB	5	7	2	1
CCC	4	2	5	3
DDD	2	6	4	2
EEE	5	3	1	1

The goal is to find the column (A, B, or C) with the maximum value and replace that value by the current value minus 0.1 times the value in the ttl column (i.e. new_value=old_value - 0.1*ttl). The other columns (not containing the maximum value) should remain the same. The resulting DT should look like this:

thing1	A	B	C	ttl
AAA	8.9	2	6	1
BBB	5	6.9	2	1
CCC	4	2	4.7	3
DDD	2	5.8	4	2
EEE	4.9	3	1	1

The "obvious" way of doing this is to write a for loop and loop through each row of the DT. That's easy enough to do and is what the code I'm adapting this from did. However, the real DT is much larger than my toy example and the for loop takes some time to run, which is why I'm trying to adapt the code to take advantage of vectorization and get rid of the loop.

Here's what I have so far:

test[,max_position:=names(.SD)[apply(.SD,1,function(x) which.max(x))],.SDcols=(2:4)]
test[,newmax:=get(max_position)-ttl*.1,by=1:nrow(test)]

which produces this DT:

thing1	A	B	C	ttl	max_position	newmax
AAA	9	2	6	1	A	8.9
BBB	5	7	2	1	B	6.9
CCC	4	2	5	3	C	4.7
DDD	2	6	4	2	B	5.8
EEE	5	3	1	1	A	4.9

The problem comes in assigning the value of the newmax column back to where it needs to go. I naively tried this, along with some other things, which tells me that "'max_position' not found":

test[,(max_position):=newmax,by=1:nrow(test)]

It's straightforward to solve the problem by reshaping the DT, which is the solution I have in place for now (see below), but I worry that with my full DT two reshapes will be slow as well (though presumably better than the for loop). Any suggestions on how to make this work as intended?

Reshaping solution, for reference:

test[,max_position:=names(.SD)[apply(.SD,1,function(x) which.max(x))],.SDcols=(2:4)]
test[,newmax:=get(max_position)-ttl*.1,by=1:nrow(test)]
test <- setDT(gather(test,idgroup,val,c(A,B,C)))
test[,maxval:=max(val),by='thing1']
test[val==maxval,val:=newmax][,maxval:=NULL]
test <- setDT(spread(test,idgroup,val))

akrun · Accepted Answer · 2022-02-07 18:45:06Z

1

With the OP's code, replace can work

test[, (2:4) := replace(.SD, which.max(.SD), max(.SD, na.rm = TRUE) - 0.1 * ttl), 
    by = 1:nrow(test),.SDcols = 2:4]

-output

> test
   thing1   A   B   C ttl
1:    AAA 8.9 2.0 6.0   1
2:    BBB 5.0 6.9 2.0   1
3:    CCC 4.0 2.0 4.7   3
4:    DDD 2.0 5.8 4.0   2
5:    EEE 4.9 3.0 1.0   1

In base R, this may be faster with row/column indexing

test1 <- as.data.frame(test)
m1 <- cbind(seq_len(nrow(test1)), max.col(test1[2:4], "first"))
test1[2:4][m1] <- test1[2:4][m1] - 0.1 * test1$ttl

edited Feb 7, 2022 at 18:45

answered Feb 7, 2022 at 18:27

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

R Data.Table: Dynamically Update a Different Column for each Row

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related