More efficient method of recoding one column in a data.frame conditional on other column entries

Question

I am looking for a more efficient method of re-coding column entries in a dataframe, where the recoding is conditional on the entries in other columns.

Take this simple example, which demonstrates my current procedure of creating a new column for the recoded data, converting it to character, and then using the subset square brackets to recode the data (is there an official name for this procedure?).

## example data frame
df = data.frame( id = seq( 1 , 100 , by=1 ) ,
                 x = rep( c("W", "Z") , each=50),
                 y = c( rep( c("A","B","C","D") , 25 ) ) )

# add a new column based on column y; convert to character 
df$newY = as.character( df$y ) 

# change newY entries to numbers based on conditions in other columns
df$newY[ df$x == "W" & df$newY == "B" ] <- 1
df$newY[ df$x == "Z" & df$newY == "D" ] <- 3

This procedure is fine for recoding variables with a small number of conditions, but becomes cumbersome for larger number of conditional arguments or when there are many distinct variables to recode.

Could anyone help me with finding a more efficient method of doing this?

Thanks!

Is there some kind of logic or pattern in the recoding? By efficiency, do you mean a method that requires less typing or do you mean faster performance / memory efficiency? — talat
– talat, Commented Feb 24, 2016 at 10:17
Would something like this solve your problem: df$newY = as.factor( paste0(df$y, df$x) ) ; as.numeric(df$newY) — Raad
– Raad, Commented Feb 24, 2016 at 10:21
@MaxPD In my data, recoding is conditional on one other column in the dataframe (as in the example) but there are up to four multiples of the same variable needing to be converted to the same new coding (e.g. imagine if there was A1, A2, A3 etc. in the above example needing to be recoded to 1, conditional on "W"). There are also 8 distinct variables in the conditional column (e.g. "x" column above), and up to 11 different variables in the y column, meaning 8 blocks of ~ 11 lines of recoding. I hope that is clear. — user3237820
– user3237820, Commented Feb 24, 2016 at 10:29
@docendodiscimus By efficiency, I do mean just less typing, sorry. For patterning, it's difficult. For instance, a variable, e.g. A, conditional on Z, may have to be recoded as 1, but A conditional on W needs recoding as 2. Perhaps a better approach would be to reshape the data frame from long to wide format, and recode each variable as a separate column... — user3237820
– user3237820, Commented Feb 24, 2016 at 10:31

Raad · Accepted Answer · 2016-02-24 10:48:03Z

1

Some approaches to this:

df <- data.frame(id = seq( 1 , 100 , by=1 ) ,
                 x = rep( c("W", "Z") , each=50),
                 y = c( rep( c("A","B","C","D") , 25)))

# Take the product (my preference)
as.numeric(df$x) * as.numeric(df$y)

# Create new factor based on x and y and convert to numeric
as.numeric(as.factor(paste0(df$x, df$y)))

answered Feb 24, 2016 at 10:48

Raad

2,7051 gold badge16 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user3237820 Over a year ago

Thanks! This helps a lot. I missed your second example in the comment to the OP.

Collectives™ on Stack Overflow

More efficient method of recoding one column in a data.frame conditional on other column entries

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related