0

I want to add a seasonal_factor column to my D1 data frame. The seasonal factor is from another source, in matrix format, with 4 matrices per year from 2021 to 2024.

I get errors on matching the same year, Month, and Weekday because the variables are mixed with row names, column names, and column values. (I added the year as a new column that combines four years' matrices into one matrix.) Can I get some suggestions on how to re-write a suitable code?

D1 <- structure(list(year = c("2021", "2021", "2021", "2021", "2021"), Month = c("Apr", "Apr", "Apr", "Apr", "Apr"), Weekday = c("Monday","Monday", "Monday", "Monday", "Thursday"), hour = c("07", "08", "16", "17", "07"), avg_speed = c(40.2, 38.3, 40, 40.1, 39.6), avg_volume = c(2612, 2389, 2108, 1948, 2612)), row.names = c(NA, 5L), class = c("tbl_df", "tbl", "data.frame"))

Matrices have been converted into one data frame with year as last column:

lookup_df <- structure(list(Weekday = c("Sunday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday"), Jan = c(1.816, 1.123, 1.15, 1.089, 1.065, 
1.03), Feb = c(1.617, 1.122, 1.171, 1.05, 1.045, 1.031), Mar = c(1.428, 
1.018, 0.988, 0.96, 0.97, 0.919), Apr = c(1.354, 1, 0.966, 0.956, 
0.956, 0.907), May = c(1.233, 0.949, 0.917, 0.897, 0.88, 0.863
), Jun = c(1.158, 0.919, 0.896, 0.878, 0.852, 0.848), Jul = c(1.177, 
0.957, 0.896, 0.866, 0.855, 0.852), Aug = c(1.145, 0.909, 0.881, 
0.868, 0.853, 0.831), Sep = c(1.168, 0.91, 0.907, 0.889, 0.86, 
0.82), Oct = c(1.234, 0.942, 0.899, 0.883, 0.871, 0.845), Nov = c(1.364, 
0.944, 0.922, 0.901, 0.905, 0.855), Dec = c(1.358, 1.011, 0.971, 
0.945, 0.938, 0.931), year = c(2021, 2021, 2021, 2021, 2021, 
2021)), row.names = c(NA, 6L), class = "data.frame")

D2 <- left_join(D1, lookup_df, by = c("year", "Weekday", "Month"))

Error in left_join(): Join columns in y must be present in the data. Problem with Month. Run rlang::last_trace() to see where the error occurred.

9
  • 3
    Hello and Welcome to SO, Amy Z! I would like to see the four matrices, please consider sharing a portion of each via dput, there might be a more streamlined way to go from them to D2. Commented Oct 30 at 14:41
  • 2
    Amy Z, welcome to SO! I agree with @Friede that if you mention the matrices, they should be included as well as the code you used to combine them. If you think that your matrix-to-frame conversion is solid, then there should be no need to mention the matrices (a red-herring/distraction from the real issue of a problematic join). Commented Oct 30 at 14:43
  • Amy Z, it isn't clear what the downvote is for, I think the question is reasonable. The solution is combo of often-asked questions: reshaping (stackoverflow.com/q/2185252/3358272, stackoverflow.com/q/76280929/3358272, but you didn't know that reshaping was necessary), and merging (see stackoverflow.com/q/38549/3358272, stackoverflow.com/q/1299871/3358272, stackoverflow.com/q/5706437/3358272, but again those don't address the need to reshape before joining). The overlap seems a plausible reason to DV, though I feel not enough. Commented Oct 30 at 15:58
  • Please ask 1 specific researched non-duplicate question. Either ask re 1 bad query/function, with obligatory minimal reproducible example, including why you think it should return something else at the 1st subexpression where you don't get what you expect, justified by reference to authoritative documentation, or ask about your overall goal, giving working parts you can do, with justification & a minimal reproducible example--then misunderstood code doesn't belong. But please ask about unexpected behaviour 1st because misconceptions get in the way of your goal. tour How to Ask Help center Basic questions are faqs. Commented Oct 30 at 16:03
  • @philipxy, I don't understand how this is more than one question. OP tried to left_join(.) something and they did not understand the error. They provided sample data, the code they ran, and the error that came of. Previously stated was the expectation of bring values from one matrix (which is no longer a matrix, but I'm digressing) into the target frame. Other than being a little distracting with "matrix" and no matrix, this has most of the components of a single question and min-reprex. The fact that some of us immediately see the pivot/join need is part of the learning, isn't it? Commented Oct 31 at 3:54

1 Answer 1

1

Because you are joining, the month-columns in your lookup_df need to be pivoted/reshaped so that they are in a "long" format for the join.

Small point: year is a string in one and numeric in another, this must be fixed. In the code below I chose to convert to a string, but it's not the only patch.

Try this:

library(dplyr)
library(tidyr) # pivot_longer
lookup_df |>
  mutate(year = as.character(year)) |>
  pivot_longer(cols = -c(Weekday, year), names_to = "Month") |>
  right_join(D1, join_by(year, Month, Weekday))
# # A tibble: 5 × 7
#   Weekday  year  Month value hour  avg_speed avg_volume
#   <chr>    <chr> <chr> <dbl> <chr>     <dbl>      <dbl>
# 1 Monday   2021  Apr   1     07         40.2       2612
# 2 Monday   2021  Apr   1     08         38.3       2389
# 3 Monday   2021  Apr   1     16         40         2108
# 4 Monday   2021  Apr   1     17         40.1       1948
# 5 Thursday 2021  Apr   0.956 07         39.6       2612

(If it isn't clear, value is the column the brings in data from your lookup_df. You can rename it if you like using pivot_longer(..., values_to="another_name").)

Sign up to request clarification or add additional context in comments.

1 Comment

Amy Z, can you give some feedback? If the suggested code does not address your issue, or if it presents another error, you will need to comment as such and/or edit your original question to update it based on conversations and clarifications to the problem-statement.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.