Add Column to R Data Frame from Another Data Frame with Matching Index Column, Only When Values are in A Certain Range

Question

I am trying to add a column to a data frame (df1) from another data frame (df2), but only when the "depth range" from df1 lies within the "depth range" from df2. I'll explain below what the depth interval represent:

df1 <-data.frame(location_ID = c("Location_01", "Location_01","Location_01", "Location_02", "Location_02","Location_02"),
                start_1 = c(0,5, 15, 0, 2.5, 5),
                end_1 = c(5,15, 25, 2.5,5, 20),
                value = c(3.00, 3.75, 3.30, 3.25, 4.15, 4.25))

df2 <-data.frame(location_ID = c("Location_01", "Location_01", "Location_02", "Location_02"),
                 start_2 = c(0, 10, 0, 5),
                 end_2 = c(10, 25, 5, 20),
                 text_description = c("First Description (Location 1)", "Second Description (Location 1)", 
                                      "First Description (Location 2)", "Second Description (Location 2)"))

In terms of the example data frames above, I would like to add the data from the "text_description" column in df2 to it's matching "location_ID" in df1, but only where the "start_1" and "end_1" columns are contained within the "start_2" and "end_2" columns.

For example, the first row of df1 should be assigned the text description in the first row of df2, since in this case the depth range from df2 associated with that description is 0-10, which includes the 0-5 range in the first row of df1

However, it gets more complicated in cases like the second row of df1, where the depth range (5-15) is not entirely included in one single depth range from df2. In these cases, I would like to assign the text value based on where the majority of that interval lies

Any advice as to how to go about this would be appreciated

I have tried this so far, based on a similiar post, but it does not work since the two data frames do not have the same number of rows (df2 is going to have less rows, since it has larger depth ranges than df1)

test_df <- df1 %>%
  mutate(text = case_when(df2$start_2 <= start_1 & df2$end_2 >= end_1 ~ df2$text_description ))

I suspect your are looking for dplyr.tidyverse.org/reference/join_by.html This is answered on several places at SO. If so it will be a dupe. — Friede
– Friede, Commented Jun 17, 2024 at 21:24
Row 2 of df1 is 50% in the Row 1 of df2 (the 5-10 portion of 5-15) and 50% in the Row 2 (the 10-15 portion of 5-15). Which should it be assigned to? Same for row 5+6. — Jon Spring
– Jon Spring, Commented Jun 17, 2024 at 21:56

Jon Spring · Accepted Answer · 2024-06-17 22:12:28Z

We can join each row in df1 to all the overlapping regions in d2, find the degree of overlap, and then keep the one with highest overlap based on a) the % of df1's region in the overlap, and b) the % of the df2 region counted in the overlap. That's one way to resolve the ties in this data if we had just used rule (a) to rank matches.

df1 |>
  mutate(row = row_number()) |>
  left_join(df2, 
            join_by(location_ID, overlaps(start_1, end_1, start_2, end_2))) |>
  mutate(range_within = pmin(end_1, end_2) - pmax(start_1, start_2),
         share_within_A = range_within / (end_1 - start_1),
         share_within_B = range_within / (end_2 - start_2)) |>
  arrange(row, -share_within_A, -share_within_B) |>
  slice(1, .by = row)

Result

  location_ID start_1 end_1 value row start_2 end_2                text_description range_within share_within_A share_within_B
1 Location_01     0.0   5.0  3.00   1       0    10  First Description (Location 1)          5.0            1.0      0.5000000
2 Location_01     5.0  15.0  3.75   2       0    10  First Description (Location 1)          5.0            0.5      0.5000000
3 Location_01    15.0  25.0  3.30   3      10    25 Second Description (Location 1)         10.0            1.0      0.6666667
4 Location_02     0.0   2.5  3.25   4       0     5  First Description (Location 2)          2.5            1.0      0.5000000
5 Location_02     2.5   5.0  4.15   5       0     5  First Description (Location 2)          2.5            1.0      0.5000000
6 Location_02     5.0  20.0  4.25   6       5    20 Second Description (Location 2)         15.0            1.0      1.0000000

Collectives™ on Stack Overflow

Add Column to R Data Frame from Another Data Frame with Matching Index Column, Only When Values are in A Certain Range

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related