0

I have a data.table and a vector which I would like to add as a column of the dt - specifically, "explode" the values of vector to rows of the dt.

Edit: tidyr::expand_grid and tidyr::crossing give the desired result

The code below gives me what I want but it feels like there should be a much simpler way to do it.

Desired result

(transform example_data to modified_data)

library(data.table)

example_data <- data.table(id = c("a", "b", "c"), value=c(1,2,3))
print(example_data)
#>        id value
#>    <char> <num>
#> 1:      a     1
#> 2:      b     2
#> 3:      c     3

explode_vector <- c("foo", "bar")

dt_list <- list()
for (val in explode_vector) {
    temp_df <- copy(example_data)
    temp_df[, new_val := eval(val)]
    dt_list <- append(dt_list, list(temp_df))
}

modified_data <- rbindlist(dt_list)
print(modified_data)
#>        id value new_val
#>    <char> <num>  <char>
#> 1:      a     1     foo
#> 2:      b     2     foo
#> 3:      c     3     foo
#> 4:      a     1     bar
#> 5:      b     2     bar
#> 6:      c     3     bar

e.g. it feels like one of the two options below should work but they don't quite. Does anyone have a suggestion for doing this?

# Some examples that I feel should work
example_data[, new_val := eval(explode_vector), by=id]
cbind(example_data, data.table(new_val=explode_vector))

Edit: Extending with duplicate id values

For example, there may be multiple identification keys (e.g. time-series observations) so the primary id is duplicated.

library(data.table)

example_data <- data.table(id = c("a", "a", "b", "c"), date=c(0, 1, 0, 1), value=c(1,2,3,1))
print(example_data)
       id  date value
   <char> <num> <num>
1:      a     0     1
2:      a     1     2
3:      b     0     3
4:      c     1     1

explode_vector <- c("foo", "bar")

# A solution using the CJ method
example_data[, CJ(id, explode_vector, unique=TRUE)][example_data, on = .(id)]
       id explode_vector  date value
   <char>         <char> <num> <num>
1:      a            bar     0     1
2:      a            foo     0     1
3:      a            bar     1     2
4:      a            foo     1     2
5:      b            bar     0     3
6:      b            foo     0     3
7:      c            bar     1     1
8:      c            foo     1     1

# A solution using the expand_grid method (simpler in my opinion but requires 
# converting back to `data.table` if trying to stick to `data.table`)
# I also prefer that it retains the original column order.
data.table(expand_grid(example_data, explode_vector))
       id  date value explode_vector
   <char> <num> <num>         <char>
1:      a     0     1            foo
2:      a     0     1            bar
3:      a     1     2            foo
4:      a     1     2            bar
5:      b     0     3            foo
6:      b     0     3            bar
7:      c     1     1            foo
8:      c     1     1            bar

2 Answers 2

3

In data.table an option is CJ

library(data.table)
example_data[, CJ(id, explode_vector)][example_data, on = .(id)]

-output

       id explode_vector value
   <char>         <char> <num>
1:      a            bar     1
2:      a            foo     1
3:      b            bar     2
4:      b            foo     2
5:      c            bar     3
6:      c            foo     3

If it is multiple columns, we may use crossing

library(tidyr)
crossing(example_data[, ..id_cols], explode_vector)
Sign up to request clarification or add additional context in comments.

8 Comments

Is it possible to do this with multiple id columns specified in a vector? e.g. Say example_data contains id and date (id_cols <- c("id", "date")).
@tbbarr do you want do.call(CJ, c(example_data[, ..id_cols], list(explode_vector = explode_vector)))
I updated my question to explain this better. Let me know if you think there's a better way to do it with the CJ approach.
@tbbarr with the CJ in your update, it is using only 'id'? Do you want also with 'date'
Ah, in fact all I want is crossing(example_data, explode_vector)! (the id_cols are irrelevant)
|
2

You can use expand_grid from tidyr package.

library(tidyr)
library(data.table)

example_data <- data.table(id = c("a", "b", "c"), value=c(1,2,3))
explode_vector <- c("foo", "bar")
tidyr::expand_grid(example_data, explode_vector)

# A tibble: 6 x 3
#   id    value explode_vector
#   <chr> <dbl> <chr>
# 1 a         1 foo
# 2 a         1 bar
# 3 b         2 foo
# 4 b         2 bar
# 5 c         3 foo
# 6 c         3 bar

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.