1

Is there an elegant way to mutate a selection of columns in a dataframe (call it df1) based on the same selection of columns in another dataframe (df2) without doing a join?

The selection of columns in df2 have the same names as in df1, also both dataframes have the same number of rows (same id columns).

In this code, please replace 'elegant_function' with your elegant function. The selection of columns is 'a' and 'b'. The 'ignore_me' column is the id column in both, which might tempt you to join the dataframes, however please ignore it instead.

df1 <- data.frame(ignore_me = 1:5, a = 1:5, b = 11:15)
df2 <- data.frame(ignore_me = 1:5, a = c(0, 1, 1, 0, 2), b = c(1, 0, 1, 2, 0))

fn <- function(x1, x2){
  if(x2 == 1){
    return(x1 - x2)
  }
  if(x2 == 2){
    return(x1 + x2)
  }
  x1
}
fn <- Vectorize(fn)

df <- elegant_function(
  df1 
  , df2
  , c("a", "b")
  , fn
  )

The output looks like this:

> df
          ignore_me a b
1         1         1 10
2         2         1 12
3         3         2 12
4         4         4 16
5         5         7 15

Here is an example of an inelegant way to do this:

df <- df1 %>% select(ignore_me) %>%
  mutate(
    a = fn(df1$a, df2$a)
    , b = fn(df1$b, df2$b)
    )

Inelegant because each selected column requires a new line in the mutate function - it would be elegant if the selected columns could be provided as an input string to the function so it can vary at run time.

There may be other columns in df1, df2 to also ignore, I've only included the 'ignore_me' column as an example of these.

4
  • 1
    You just talked of value of column a in row 2. What of row 1? What of column b in row 1? Commented May 27, 2020 at 0:26
  • Corrected column b in df. For that example 10 = fn(11, 1). Commented May 27, 2020 at 0:28
  • I still don't get it what you are trying to do here. For c("a", "b") column in df1 and df2 how do you want to apply the function? Commented May 27, 2020 at 0:36
  • First, the function fn is applied to column 'a' in df1 and column 'a' in df2 (as the inputs x1, x2 to fn) to create column 'a' in df; then the function is applied to column 'b' ... Commented May 27, 2020 at 0:41

1 Answer 1

2

Since we are to ignore the ignore_me column, we could do:

(-1)^df2 * df2 + df1
  ignore_me a  b
1         0 1 10
2         4 1 12
3         0 2 12
4         8 4 16
5         0 7 15

Check the other columns apart from the ignore me column

Update:

elegant_function <- function(dat1,dat2,colNames,FUN)
{
  dat1[colNames] <- data.frame(Map(Vectorize(FUN),dat1[colNames],dat2[colNames]))
  dat1
}
elegant_function(df1, df2, c("a", "b"), fn)

 ignore_me a  b
1         1 1 10
2         2 1 12
3         3 2 12
4         4 4 16
5         5 7 15
Sign up to request clarification or add additional context in comments.

5 Comments

@Vlad so what exactly is the issue? Does this not solve your problem?
I've provided an example of an inelegant solution. Your idea is hard coding the function fn and not providing it as a parameter which doesn't solve my problem.
@Vlad, I reread your question again. elegant function cannot be elegant if fn is not vectorized in the first place
Almost there - just missing the ignore_me column which can be taken from df1.
Just a small typo - replace df1 with dat1 in the function definition - otherwise works perfectly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.