0

I have a dataframe, df, with several columns in it. I would like to create a function to create new columns dynamically using existing column names. Part of it is using the last four characters of an existing column name. For example, I would like to create a variable names df$rev_2002 like so:

df$rev_2002 <- df$avg_2002 * df$quantity

The problem is I would like to be able to run the function every time a new column (say, df$avg_2003) is appended to the dataframe.

To this end, I used the following function to extract the last 4 characters of the df$avg_2002 variable:

substRight <- function (x,n) {
  substr(x, nchar(x)-n+1, nchar(x))
}

I tried putting together another function to create the columns:

revved <- function(x, y, z){
  z = x * y
  names(z) <- paste('revenue', substRight(x,4), sep = "_")
  return x
}

But when I try it on actual data I don't get new columns in my df. The desired result is a series of variables in my df such as:

df$rev_2002, df$rev_2003...df$rev_2020 or whatever is the largest value of the last four characters of the x variable (df$avg_2002 in example above).

Any help or advice would be truly appreciated. I'm really in the woods here.

1
  • 1
    Hello, could you show how you are using revved with a small example data set?. Also, an easy way to programmatically make new columns with strings is the [[ operator. Commented May 6, 2021 at 20:34

1 Answer 1

1
dat <- data.frame(id = 1:2, quantity = 3:4, avg_2002 = 5:6, avg_2003 = 7:8, avg_2020 = 9:10)
func <- function(dat, overwrite = FALSE) {
  nms <- grep("avg_[0-9]+$", names(dat), value = TRUE)
  revnms <- gsub("avg_", "rev_", nms)
  if (!overwrite) revnms <- setdiff(revnms, names(dat))
  dat[,revnms] <- lapply(dat[,nms], `*`, dat$quantity)
  dat
}

func(dat)
#   id quantity avg_2002 avg_2003 avg_2020 rev_2002 rev_2003 rev_2020
# 1  1        3        5        7        9       15       21       27
# 2  2        4        6        8       10       24       32       40
Sign up to request clarification or add additional context in comments.

7 Comments

What if there are other columns in the data frame?
They aren't affected. The new columns are appended. Please try it! I promise this function is only additive, and only when specific conditions are met. If no avg_* columns exist, nothing is added/changed. (Okay, not 100% ... I just added the overwrite= argument. With the default of FALSE, existing rev_* columns will not be touched. That's the only "change" part of the function.)
this is great...one more question: how would one use a vector of column names and substitutions?
The function works on whatever data.frame you send to it. Whatever the frame is named outside the function does not matter. If you were to do func(mtcars), all references inside the function will see the data as dat. If there exists dat outside of the function, that frame is completely different than what the functions sees inside. If you call func(dat), the fact that dat$quantity is referenced inside is coincidentally the same.
If instead you mean that the name to use as the quantity changes, then change the function definition to be function(dat, overwrite=FALSE, value.var="quantity") and then change dat$quantity to dat[[value.var]]. From there, if you have a frame with a different base-value field, call func(otherdat, value.var="otherfield"), and it should work similarly.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.