0

I have a dataframe (called train) that contains a YOB (year of birth) column. I'd like to compute the Age in a separate column, like so:

train$Age = 2016 - train$YOB

This works fine.

The problem is that I would also like to do this operation (along with other preprocessing operations) to a number of other dataframes. So, I was thinking to extract the common parts in a function and pass the dataframes to be processed as parameters to the function:

preprocess = function(d) {
  d$Age = 2016 - d$YOB
  # other transformations...
} 

After defining the function above, I expected that calling preprocess(train) would perform the aforementioned transformations on my dataframe. But it doesn't. For example, train$Age is NULL after the call.

Why doesn't the preprocess function transform the dataframe as expected? Is there a way to fix this?

2
  • preprocess = function(d) d$Age <<- 2016 - d$YOB or preprocess = function(d) 2016 - d$YOB; d$age <- preprocess(d). Object that made in function is not outside of function except for <<-. Commented Jun 13, 2016 at 9:32
  • @crayfish44 Now I get an error saying that object of type 'closure' is not subsettable Commented Jun 13, 2016 at 9:37

2 Answers 2

2

In R (and almost all languages), when control is transferred to a function, the interpreter sets a "scope" of which variables would be available in the function.

Consider the variables a and b and the function "preprocess":

> a <- 2
> b <- 3
> preprocess <- function(a){a <- a + b; cat("value of a=", a, "\n")}
> preprocess(a)
value of a= 5 
> cat("value of a=", a, "\n")
value of a= 2

Here, the variables "a" and "b" were both visible inside the function, and the value of variable "a" did change within the scope of the function. But as soon as the function completed and returned, this environment was discarded and the updated value of the variable was "lost".

The global value of the variable which was 2 earlier, remained as-is.

However, if you return back the value of "a" from the function, the value of "a" is changed, see this example:

> a <- 2
> b <- 3
> preprocess <- function(a){a <- a + b; cat("value of a=", a, "\n"); return(a)}
> a <- preprocess(a)
value of a= 5 
> cat("value of a=", a, "\n")
value of a= 5

See this help reference within your R session ?environment for more information.

Sign up to request clarification or add additional context in comments.

Comments

1

You add the new column only inside the function but functions usually do not Change the values outside of that function. There is a quick and dirty way via <<- but should really not use that ever! Because your function would change values outside of the function and functions are not supposed to do that. It is very bad style. Values should enter functions as arguments and should leave them as return values.

So change the dataframe in your function and give it back as return value:

preprocess = function(d) {
  d$Age = 2016 - d$YOB
  return(d)
} 

test <- data.frame(YOB=2017:2020)

test <- preprocess(test)

print(test)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.