0

I am trying to add a column to a data frame (df1) from another data frame (df2), but only when the "depth range" from df1 lies within the "depth range" from df2. I'll explain below what the depth interval represent:

df1 <-data.frame(location_ID = c("Location_01", "Location_01","Location_01", "Location_02", "Location_02","Location_02"),
                start_1 = c(0,5, 15, 0, 2.5, 5),
                end_1 = c(5,15, 25, 2.5,5, 20),
                value = c(3.00, 3.75, 3.30, 3.25, 4.15, 4.25))

df2 <-data.frame(location_ID = c("Location_01", "Location_01", "Location_02", "Location_02"),
                 start_2 = c(0, 10, 0, 5),
                 end_2 = c(10, 25, 5, 20),
                 text_description = c("First Description (Location 1)", "Second Description (Location 1)", 
                                      "First Description (Location 2)", "Second Description (Location 2)"))

In terms of the example data frames above, I would like to add the data from the "text_description" column in df2 to it's matching "location_ID" in df1, but only where the "start_1" and "end_1" columns are contained within the "start_2" and "end_2" columns.

For example, the first row of df1 should be assigned the text description in the first row of df2, since in this case the depth range from df2 associated with that description is 0-10, which includes the 0-5 range in the first row of df1

However, it gets more complicated in cases like the second row of df1, where the depth range (5-15) is not entirely included in one single depth range from df2. In these cases, I would like to assign the text value based on where the majority of that interval lies

Any advice as to how to go about this would be appreciated

I have tried this so far, based on a similiar post, but it does not work since the two data frames do not have the same number of rows (df2 is going to have less rows, since it has larger depth ranges than df1)

test_df <- df1 %>%
  mutate(text = case_when(df2$start_2 <= start_1 & df2$end_2 >= end_1 ~ df2$text_description ))
2
  • 1
    I suspect your are looking for dplyr.tidyverse.org/reference/join_by.html This is answered on several places at SO. If so it will be a dupe. Commented Jun 17, 2024 at 21:24
  • 1
    Row 2 of df1 is 50% in the Row 1 of df2 (the 5-10 portion of 5-15) and 50% in the Row 2 (the 10-15 portion of 5-15). Which should it be assigned to? Same for row 5+6. Commented Jun 17, 2024 at 21:56

1 Answer 1

0

We can join each row in df1 to all the overlapping regions in d2, find the degree of overlap, and then keep the one with highest overlap based on a) the % of df1's region in the overlap, and b) the % of the df2 region counted in the overlap. That's one way to resolve the ties in this data if we had just used rule (a) to rank matches.

df1 |>
  mutate(row = row_number()) |>
  left_join(df2, 
            join_by(location_ID, overlaps(start_1, end_1, start_2, end_2))) |>
  mutate(range_within = pmin(end_1, end_2) - pmax(start_1, start_2),
         share_within_A = range_within / (end_1 - start_1),
         share_within_B = range_within / (end_2 - start_2)) |>
  arrange(row, -share_within_A, -share_within_B) |>
  slice(1, .by = row)

Result

  location_ID start_1 end_1 value row start_2 end_2                text_description range_within share_within_A share_within_B
1 Location_01     0.0   5.0  3.00   1       0    10  First Description (Location 1)          5.0            1.0      0.5000000
2 Location_01     5.0  15.0  3.75   2       0    10  First Description (Location 1)          5.0            0.5      0.5000000
3 Location_01    15.0  25.0  3.30   3      10    25 Second Description (Location 1)         10.0            1.0      0.6666667
4 Location_02     0.0   2.5  3.25   4       0     5  First Description (Location 2)          2.5            1.0      0.5000000
5 Location_02     2.5   5.0  4.15   5       0     5  First Description (Location 2)          2.5            1.0      0.5000000
6 Location_02     5.0  20.0  4.25   6       5    20 Second Description (Location 2)         15.0            1.0      1.0000000
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.