Create a new column in data frame by matching two values between data frames

Question

I am trying to create a new column in a data frame using mutate. This should match values in two columns between 2 different data frames, and ID and a step number, and then return the value from a third column in my second data frame. Hopefully my code below makes it a little clearer what I'm trying to achieve!

Is this the right way to go about it, I've looked into using merge but don't think that quite does what I need.

Step1 <- iData %>%

filter(IndicatorID == 43) %>%

mutate(Step = 1) %>%

mutate(iresult = InputA + InputB) %>%

mutate(stepname = ifelse(IndicatorID == Step$IndicatorID & Step==Step$Step,Step$StepName, ""))

Basically, it should look to find the row in Step where the Indicator is 43 and Step = 1, then put the value in the new column, in this case it would be "Gross value added". Any help will be really appreciated!

Can you make this post reproducible by adding data and show expected output for the same? — Ronak Shah
– Ronak Shah, Commented Oct 2, 2019 at 10:54

JFlynn · Accepted Answer · 2019-10-02 11:22:05Z

1

If I'm interpreting correctly, thinking about this as a join rather than mutating might make it alot easier

I've creating dummy data, hopefully that will make clear the assumptions I'm making re. the data.

So we have two tables. In both we have IndicatorID and Step. Then in the step dataframe we have a var 'StepName' and we want to be able to use those values in a third table called step1 by matching on IndicatorID and Step.

step <- tibble(
        IndicatorID = c(41, 42, 43, 44, 45, 46), 
        Step = c(1, 2, 1, 4, 5, 6), 
        StepName = c('left', 'right', 'up', 'down', 'under', 'over'))


iData <- tibble(
        IndicatorID = c(seq(from = 1, to = 43)), 
        InputA = runif(43), 
        InputB = runif(43)) %>%
        mutate(iresult = InputA + InputB)

Step1 <- iData %>%
        filter(IndicatorID == 43) %>%
        mutate(Step = 1) %>%
        left_join(step, by = c('IndicatorID', 'Step'))

IndicatorID InputA InputB iresult  Step StepName
        <dbl>  <dbl>  <dbl>   <dbl> <dbl> <chr>   
          43  0.773  0.124   0.898     1 up   


### Example where we select only the columns from step 
### that we are interested in keeping, without doing a semi_join

Step1 <- iData %>%
        filter(IndicatorID == 43) %>%
        mutate(Step = 1) %>%
        left_join(step %>%
             select(IndicatorID, Step, StepName), 
             by = c('IndicatorID', 'Step'))

edited Oct 2, 2019 at 11:22

answered Oct 2, 2019 at 10:56

JFlynn

3442 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Megan Critchley Over a year ago

That worked perfectly thank you! Out of curiosity, if the step data frame had more columns than just the one I wanted, is there way to specify left_join to only take certain columns? Thanks!

JFlynn Over a year ago

semi_join would be the way to go in that case. This link is a great resource: stat545.com/… Also I've found that in lots of cases remembering the names and meanings of joins can be tricky. You can always do a select() within the left_join to make what you're doing a little more obvious. I'll add an example in the original post with this option.

Collectives™ on Stack Overflow

Create a new column in data frame by matching two values between data frames

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related