First, a simplified example (which you can easily adapt):
df_points <- data.table::fread("
tag_id date_time other_cols
100 2025-06-30 'P1'
200 2025-01-31 'P2'
200 2025-04-01 'P3'
200 2025-06-01 'P4'
300 2025-10-01 'P5'
", data.table=FALSE)
df_ranges <- data.table::fread("
tag_id date_time_start date_time_end other_cols
100 2025-01-01 2025-02-28 'R1'
200 2025-03-01 2025-06-30 'R2'
200 2025-05-01 2025-08-31 'R3'
400 2025-09-01 2025-11-30 'R4'
", data.table=FALSE)
It seems that what you are describing is a left range join. A join is defined by two things:
- a rule or combination of rules for deciding which pairs of rows from either side form a match, and therefore join together in a combined row. You're describing an inequality join, and specifically what is known as a range join, where a match exists if a point value on one side falls between an upper and lower bound on the other. You also have additional equality conditions (represented by only one column in this simplified example).
- which sets of joining/non-joining rows to reflect in the result. You want to keep the unjoined rows from the left-hand table but not the right-hand table (point 4), which makes this a left join
Re. (1), there is a possible additional issue, i.e. whether you want to allow multiple matches if a time point falls into multiple intervals. Please notice that I've rigged the example so that row P4 on the left matches two intervals on the right at rows R2 and R3. You need to specify whether this is a relevant consideration and what the policy should be.
(Your use of := (implying an update join) suggests you don't want multiple matches, because an update join has to preserve the dimension of the left-hand table without recycling/expanding any of its rows. On the other hand you might just be echoing the code in the question you reference. If you do want an update join, then there are some complications with multiple matches. But if what I've just said means nothing to you, ignore it!)
Okay, so assuming you don't need an update join, you have a couple of convenient options for what to use. The first I will mention is (my) utility package {fjoin}, which writes and runs {data.table} code while adding lots of bells and whistles, and works directly on non-data.tables.
install.packages("fjoin", repos = c("https://trobx.r-universe.dev")) # on CRAN soon
library(fjoin)
fjoin_left(df_points,
df_ranges,
on=c("tag_id", "date_time>=date_time_start", "date_time<=date_time_end"),
indicate=TRUE)
.join tag_id date_time other_cols date_time_start date_time_end R.other_cols
1 1 100 2025-06-30 P1 <NA> <NA> <NA>
2 1 200 2025-01-31 P2 <NA> <NA> <NA>
3 3 200 2025-04-01 P3 2025-03-01 2025-06-30 R2
4 3 200 2025-06-01 P4 2025-03-01 2025-06-30 R2
5 3 200 2025-06-01 P4 2025-05-01 2025-08-31 R3
6 1 300 2025-10-01 P5 <NA> <NA> <NA>
What is nice here is that you can set indicate=TRUE to add an upfront column showing which input each row came from (1 for left, 2 for right, 3 for both). This simple but useful feature has existed in Stata since its release in January 1985. It's also been adopted in R by the excellent {collapse} package, but {collapse} doesn't do inequality joins.
If you only want one match per row of the left input (say, the first) then you can set mult.x = "first". That will lose the second match with P4 above (I'll leave you to run it).
However, the mainstream answer is to use {dplyr}, which has supported inequality joins for a while now. This is the solution people will naturally point you to because it is such a widely used package. You will lose the indicator column (and other options that might be relevant here), and it won't be quite as fast on large data, though that's very unlikely to matter.
library(dplyr)
left_join(df_points,
df_ranges,
join_by(tag_id, date_time>=date_time_start, date_time<=date_time_end))
{dplyr} has a shorthand for range joins (though the join it does is the same):
left_join(df_points,
df_ranges,
join_by(tag_id, between(date_time, date_time_start, date_time_end)))
NB If needed, the {dplyr} equivalent of {fjoin}'s mult.x is multiple. ({fjoin} also has a mult.y but I don't think it comes into play here.)
You've used {data.table} in your attempt but, again, you might just be reflecting the code you saw in an answer you consulted. If you really do mean to write the join directly in {data.table}, be warned that it doesn't automatically represent all the join columns - it garbles them in a certain way that I'm not going to explain here. Patching that up is a bit of a pain but you can use {fjoin} to ghostwrite the code for you by telling it just to show the code that it generates by using do=FALSE:
library(fjoin)
fjoin_left(df_points,
df_ranges,
on=c("tag_id", "date_time>=date_time_start", "date_time<=date_time_end"),
indicate=TRUE,
do=FALSE)
.DT : y = df_ranges (cast as data.table)
.i : x = df_points (cast as data.table)
Join: .DT[, fjoin.ind.DT := TRUE][.i, on = c("tag_id", "date_time_start <=
date_time", "date_time_end >= date_time"), data.frame(.join =
fifelse(is.na(fjoin.ind.DT), 1L, 3L), tag_id = i.tag_id, date_time, other_cols =
i.other_cols, date_time_start = x.date_time_start, date_time_end =
x.date_time_end, R.other_cols = other_cols)]