2

Say I have two dataframes, A and B, and they are produced like this:

library(dplyr)
# Example Data A
{
  set.seed(123)
  
  index = rep(c(1:30),
              each = 15*360)
  
  month = rep(c(1:12), 
              each = 15, 
              times = 30)
  
  day = rep(c(1:15),
            each = 1,
            times = 360)
  
  variable_of_interest = runif(n = 15*360*30,
                               min = 0,
                               max = 100)
  
  Data_A = as.data.frame(cbind(index,
                             month,
                             day,
                             variable_of_interest)) 
}

# Example Data B
{
  Data_B = Data_A %>% group_by(index,
                               month) %>% summarise(classification_threshold = mean(variable_of_interest))
}
  

Data_A and Data_B have two similar columns, index and month, but have different rownumbers.

What I desire is to use the column called classification_threshold of dataframe Data_B to mutate dataframe Data_A by creating a new column, that indicates, whether the corresponding observation of variable_of_interest exceeds its own unique threshold (value=1) or below (value=0).

In doing so, I'd like to use the columns index and month to identify the correct classification_threshold value to compare variable_of_interest with.

1 Answer 1

1

Do a left join between the Data_A and summarised Data_B by 'index', 'month' and create the column by comparing the two columns

library(dplyr)
Data_A_new <- left_join(Data_A, ungroup(Data_B), by = c("index", "month")) %>% 
   mutate(flag = +(variable_of_interest > classification_threshold))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.