Create a new data frame column based on another column

Question

I have a data frame with two columns, and want to create a third column which will essentially be a boolean for whether or not column two contain a certain set of specified values.

f <- data.frame(name=c("John", "Sara", "David", "Chad"),
                 car=c("Honda|Ford", "BMW", "Toyota|Chevy|Ford", 
                 "Toyota|Chevy|Ford|Honda"))

The first thing I did was remove the | from each string in the second column, and placed those valued in a third column

library(stringr)
g = str_replace_all(f$car, "[^[:alnum:]]", " ")
f$make = c(g)
f

What I want to do now if create another column, which will be a boolean, 1 if make contains a common car, and 0 if it contains a not common car.

common = c("Honda", "Ford", "Toyota", "Chevy")
not_common = c("BMW", "Lexus", "Acura")

I've tried a few things, including the stringr package and ifelse to produce the following output.

   name                     car                    make       common   
1  John              Honda|Ford              Honda Ford           1
2  Sara                     BMW                     BMW           0
3 David       Toyota|Chevy|Ford       Toyota Chevy Ford           1
4  Chad Toyota|Chevy|Ford|Honda Toyota Chevy Ford Honda           1

Since it's possible to have both a common and uncommon car as an entry, the uncommon make should override the common make and that row should take the value 0 in the common column. So if an entry had both BMW and Ford, that entry should take a 0 in the common column.

Can anyone help with this task.

Oh, and here's what I tried with the stringr package, but it doesn't work.

common = c("Honda", "Ford", "Toyota", "Chevy")
not_common = c("BMW", "Lexus", "Acura")
common_match <- str_c(common)
not_match <- str_c(not_common)

main <- function(df) {
  f$new_make <- str_detect(f$make, common_match)
  df
}

main(f)

Thanks!

Julius Vainora · Accepted Answer · 2012-07-09 21:16:03Z

2

Another way and a comparison

f2 <- f[rep(1:4,50000),]
system.time({
v <- sapply(f2$make, strsplit, " ")
sapply(v, function(x) max(1-not_common %in% x)*max(common %in% x))
})
 user  system elapsed 
 7.94    0.01    8.00 

system.time(sapply(f2$car,function(x) ifelse(length(grep("BMW|Lexus|Acura",x))>0,0,1)))
 user  system elapsed 
28.72    0.04   28.87

answered Jul 9, 2012 at 21:16

Julius Vainora

48.4k9 gold badges95 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

thelatemail · Accepted Answer · 2012-07-09 21:07:39Z

2

Not sure if this is the most efficient way, but try this one using grep and ifelse applied to each value of f$car. The | characters just mean or for combining search terms inside grep and have nothing to do with the separator in your data.

f$common <- sapply(f$car,function(x) ifelse(length(grep("BMW|Lexus|Acura",x))>0,0,1))

Result:

> f
   name                     car common
1  John              Honda|Ford      1
2  Sara                     BMW      0
3 David       Toyota|Chevy|Ford      1
4  Chad Toyota|Chevy|Ford|Honda      1

answered Jul 9, 2012 at 21:07

thelatemail

94.3k12 gold badges140 silver badges197 bronze badges

Collectives™ on Stack Overflow

Create a new data frame column based on another column

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related