replace values in data-table based on column number and separate index vector

Question

I am given a large data-table and I need to set cells to a fixed value (e.g. 0) based on the column number and an index dependent on the row number.

As an example, I am given a data-table 'dt' full of ones. Additionally, I have a column vector, giving the number of columns (per row) that shall remain unchanged and the remaining ones shall be set to 0.

dt <- setnames(data.table(matrix(1,nrow=100, ncol=11)),as.character(c(0:10)))

set.seed(1)
index <- sample(c(0:11),100, replace=TRUE)

> dput(index)
c(3L, 4L, 6L, 10L, 2L, 10L, 11L, 7L, 7L, 0L, 2L, 2L, 8L, 4L, 
9L, 5L, 8L, 11L, 4L, 9L, 11L, 2L, 7L, 1L, 3L, 4L, 0L, 4L, 10L, 
4L, 5L, 7L, 5L, 2L, 9L, 8L, 9L, 1L, 8L, 4L, 9L, 7L, 9L, 6L, 6L, 
9L, 0L, 5L, 8L, 8L, 5L, 10L, 5L, 2L, 0L, 1L, 3L, 6L, 7L, 4L, 
10L, 3L, 5L, 3L, 7L, 3L, 5L, 9L, 1L, 10L, 4L, 10L, 4L, 4L, 5L, 
10L, 10L, 4L, 9L, 11L, 5L, 8L, 4L, 3L, 9L, 2L, 8L, 1L, 2L, 1L, 
2L, 0L, 7L, 10L, 9L, 9L, 5L, 4L, 9L, 7L)

For example, in the first row, the first three cells remain unchanged and the other ones are set to 0. As it is a large data-table, I look for an efficient way to do this

set.seed() before creating creating random data for reproducibility — s_baldur
– s_baldur, Commented May 28, 2019 at 10:07
Thanks for the comment. I actually did, but forgot to copy it here ;) — Strickland
– Strickland, Commented May 28, 2019 at 11:56

chinsoon12 · Accepted Answer · 2019-05-28 10:07:32Z

2

An option using Matrix package:

library(Matrix)
mat <- as.matrix(dt)
mat * as.matrix(sparseMatrix(
    i=rep(seq_along(index), index),
    j=unlist(sapply(index, seq_len)), 
    x=1))

Or using data.table::set:

for (j in seq_along(names(dt)))
    set(dt, which(j>index), j, 0)

edited May 28, 2019 at 10:07

answered May 28, 2019 at 10:02

chinsoon12

25.2k4 gold badges27 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Strickland Over a year ago

I actually took the solution with data.table::set. It turned out to be the fastest one of the current suggestions.

Sven · Accepted Answer · 2019-05-28 10:00:53Z

2

In order to avoid complexity, I've taken the reverse approach and first changed all the 1s to 0s. Then it's a double for loop to change the amount of columns indicated in index, to 1s:

library(data.table)

dt <- setnames(data.table(matrix(0,nrow=100, ncol=11)),as.character(c(0:10)))

index <- sample(c(0:11),100, replace=TRUE)

for(i in 1:length(index)) {
  if (index[i] > 0) {
    for(j in 1:index[i]) {
      dt[i,j] <- 1
    }
  }
}

answered May 28, 2019 at 10:00

Sven

1,2631 gold badge7 silver badges16 bronze badges

1 Comment

jangorecki Over a year ago

replace dt[i,j] <- 1 with set(dt, i, j, 1) and should be pretty fast, otherwise will be terribly slow

s_baldur · Accepted Answer · 2019-05-28 10:07:51Z

1

last_col <- names(dt)[ncol(dt)]
for (r in seq_len(nrow(dt))) {
  zero_from <- max(index[r]-1L, 0L)
  set(dt, i = r, j = as.character(zero_from:last_col), value = 0)
}

answered May 28, 2019 at 10:07

s_baldur

34.6k4 gold badges43 silver badges80 bronze badges

Comments

Ronak Shah · Accepted Answer · 2019-05-28 10:08:49Z

0

Since you have dt full of 1's you can recreate the entire data.table by

library(data.table)

cols <- ncol(dt)
data.table(t(sapply(seq_len(nrow(dt)), function(i) 
                   rep(c(1, 0), c(index[i], cols - index[i])))))


#     V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
# 1:  1  1  1  0  0  0  0  0  0   0   0
# 2:  1  1  1  1  0  0  0  0  0   0   0
# 3:  1  1  1  1  1  1  0  0  0   0   0
# 4:  1  1  1  1  1  1  1  1  1   1   0
# 5:  1  1  0  0  0  0  0  0  0   0   0
# 6:  1  1  1  1  1  1  1  1  1   1   0
# 7:  1  1  1  1  1  1  1  1  1   1   1
# 8:  1  1  1  1  1  1  1  0  0   0   0
# 9:  1  1  1  1  1  1  1  0  0   0   0
#10:  0  0  0  0  0  0  0  0  0   0   0
#....

compare it with first 10 index values

index[1:10]
# [1]  3  4  6 10  2 10 11  7  7  0

edited May 28, 2019 at 10:08

answered May 28, 2019 at 10:01

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

1 Comment

s_baldur Over a year ago

Wasn't the other way around (the 0/1s)?

Collectives™ on Stack Overflow

replace values in data-table based on column number and separate index vector

4 Answers 4

1 Comment

1 Comment

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related