0

I have two data.tables "commuters_old" and "commuter_new":

commuters_old: (1.703.000 observations)

from1 to1 amount1
1001 1001 12
1001 1002 2
1001 1003 23
2000 1002 200
2001 2001 80
2001 2002 98

commuters_new: (1.400.000 observations)

from2 to2 amount2
1000 1001 10
1001 1002 22
1001 1004 2
2000 1002 50
2001 2001 8
2001 2003 100
library(data.table)

commuters_old <- data.table(
  from1 = c(1001L, 1001L, 1001L, 2000L, 2001L, 2001L),
  to1 = c(1001L, 1002L, 1003L, 1002L, 2001L, 2002L),
  amount1 = c(12L, 2L, 23L, 200L, 80L, 98L))

commuters_new <- data.table(
  from2 = c(1000L, 1001L, 1001L, 2000L, 2001L, 2001L),
  to2 = c(1001L, 1002L, 1004L, 1002L, 2001L, 2003L),
  amount2 = c(10L, 22L, 2L, 50L, 8L, 100L))

I would like to add the column "amount1" to data.table "commuters_new" but only if "from1 == from2" AND "to1 == to2". If this condition is not met, I want to write 0, so that:

commuters_new:

from2 to2 amount2 amount1
1000 1001 10 0
1001 1002 22 2
1001 1004 2 0
2000 1002 50 200
2001 2001 8 80
2001 2003 100 0

After that merge, I want to add another column "factor" that gives me the factor of "amount2" and"amount1".

commuters_new:

from2 to2 amount2 amount1 factor
1000 1001 10 0 0
1001 1002 22 2 11
1001 1004 2 0 0
2000 1002 50 200 0.25
2001 2001 8 80 0.1
2001 2003 100 0 0

What I have tried:

commuters_new[, amount1 := ifelse(from1==from2 & to1==to2, commuters_old$amount1, 0)]

OR

commuters_new$amount1[, paste(ifelse(from1==from2 & to1==to2, commuters_old$amount1, 0))]

But I always get the error:

Supplied 1703144 items to be assigned to 1408969 items of column 'amount1'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code

0

3 Answers 3

0

Another way, using dtplyr:

Steps:

  1. Load it as a lazy data table
  2. Perform a left join, joining new on old
  3. Replace the empty values in our new amount1 column with 0s
  4. If amount1 is 0, factor is 0. Otherwise, it is amount2 / amount1
  5. Tell dtplyr to run the code by running as.data.table()
pacman::p_load(data.table, dtplyr, tidyverse)

commuters_new |>
  lazy_dt() |>
  left_join(commuters_old, by = c("from2" = "from1", "to2" = "to1")) |>
  replace_na(list(amount1 = 0L))|>
  mutate(factor = ifelse(!amount1, 0, amount2 / amount1)) |>
  as.data.table()

Equivalent data.table code:

setnames(setcolorder(`_DT34`[`_DT33`, on = .(from1 = from2, to1 = to2), 
    allow.cartesian = TRUE], c(1L, 2L, 4L, 3L)), c("from1", "to1"
), c("from2", "to2"))[, `:=`(amount1 = fcoalesce(amount1, 0L))][, 
    `:=`(factor = fifelse(amount1 == 0, 0, amount2/amount1))]
# or
out <- (commuters_old[commuters_new, on = .(from1 = from2, to1 = to2)]
       [, amount1 := fcoalesce(amount1, 0L)]
       [, factor := ifelse(!amount1, 0, amount2 / amount1)])

Output:

   from1  to1 amount1 amount2 factor
1:  1000 1001       0      10   0.00
2:  1001 1002       2      22  11.00
3:  1001 1004       0       2   0.00
4:  2000 1002     200      50   0.25
5:  2001 2001      80       8   0.10
6:  2001 2003       0     100   0.00
Sign up to request clarification or add additional context in comments.

Comments

0

You want to use data.table's merge to add the data from the second data to the first dataset.

Eg something like this

merge(commuters_old, commuters_new,
      by.x = c("from1", "to1"), by.y = c("from2", "to2"),
      all.x = TRUE)

Also, the error you are seeing regarding recycling indicates that you have duplicate values. This is especially nasty when the second (commuters_new) dataset has duplicate entries as this will duplicate corresponding values in the first dataset.

1 Comment

this is great, but it doesn't create the factor column, nor does it replace the NAs with 0s
0

You can use the right_join and mutate functions from the dplyr package:

library(dplyr)

commuters <- right_join(commuters_old, commuters_new, by = c("from1" = "from2",
                                                            "to1" = "to2"))
# change NAs to 0s
commuters[is.na(commuters)] <- 0 # check if there are no NAs in your from and to

#calculate factor
mutate(rowwise(commuters), factor = amount2/amount1)

2 Comments

I think it is a left_join backboned? I would like to add the column "amount1" to data.table "commuters_new" but only if "from1 == from2" AND "to1 == to2".
You're right, thank you. Except if you have old before new it's a right join.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.