2

My data is organized as follows:

year    company color   car_total
2000    toyota  red     873
2013    honda   red     737
2012    nissan  green   809
2002    toyota  blue    429
2000    nissan  green   861
2012    honda   red     742
2009    toyota  red     320
2010    ford    yellow  319
2000    ford    green   587
2011    nissan  blue    777
2014    ford    blue    32

I am trying to replace the values in columns given multiple conditions. Two situations:

  1. I would like to replace each car_total for rows of company == ford OR company == nissan with 0. What command would accomplish this?

  2. What if my constraints came from different columns? e.g. What if I wanted to replace any car_total which has its company == ford OR color == red with 0?

7
  • Thanks for the response. And yes, I'm a relative R newbie. However, I asked the question because I'd previously tried your exact command you suggested, and it turned all my car_total values in my entire data set to 0. One slight issue that might be causing our different results: I'm actually doing a conditional NOT. Thus the code I'm trying (which makes all my values 0) is: dat$car_total[dat$company != "ford" | dat$color != "red"] = 0 Commented Feb 4, 2016 at 0:09
  • So, only red fords would be left untouched. !ford | !red is the same as !(ford & red) Commented Feb 4, 2016 at 0:11
  • 1
    Oh wait, I'm a moron. I realized I should be using ands rather than ors when doing nots. Ok, I got it to work now. Thanks for the help! Commented Feb 4, 2016 at 0:15
  • @thelatemail You gave me negative points for a question my code solves correctly. Y are you being ridiculous? Commented Feb 4, 2016 at 0:16
  • 1
    @Jim - you might have to convert the variables via dat$color <- as.character(dat$color) first - R won't allow you to assign a factor a value which isn't already present. Commented Feb 4, 2016 at 0:55

3 Answers 3

2

As you have seen from comments this can be done compactly as a standard selection. But sometimes logical vectors make things clearer.

Assuming your dataframe is called df

redcars <- df$color == "red"
fords <- df$company == "ford"
ford_or_nissan = fords | df$company == "nissan" # or alternatively
ford_or_nissan = df$company %in% c("ford","nissan")

This gives you three vectors you can use to select the desired rows

df$car_total[ford_or_nissan] <- 0
df$car_total[fords | redcars] <- 0

With logical operators, you can build up as complex a selection as you want.

Sign up to request clarification or add additional context in comments.

Comments

1

I like working with the data.table library

# Replace car total with 0 when company = ford OR company = nissan
dt[company %in% c("ford","nissan"), car_total := 0]

# Replace any car_total with 0 when company = ford OR color = red
dt[company == "ford" | color == "red", car_total := 0]

Comments

-3

For your first question:

    year<-c(2000,2013,2012,2002,2000,2012,2009,2010,2000,2011,2014)
company<-c('toyota','honda','nissan','toyota','nissan','honda','toyota','ford','ford','nissan','ford')
color<-c('red','red','green','blue','green','red','red','yellow','green','blue','blue')
car_total<-as.integer(c(873,737,809,429,861,742,320,319,587,777,32))
df<-data.frame(year,company,color,car_total)
for (i in 1:nrow(df))
{
ifelse (df$company[i]=='ford', df$car_total[i]<-0, NA)
ifelse (df$company[i]=='nissan',df$car_total[i]<-0, NA)
}

3 Comments

That is really not the way to go, just use - dat$car_total[dat$company %in% c("ford","nissan")] <- 0 and be done with it.
Performance would be a major reason for using %in%. Using a==b | a==c needs three full O(n) operations (two compares, and an or). Using a %in% c(b,c) is done in a single pass with either one or two comparisons per row. In a million+ row table, that's significant.
@MaxPD - no need for personal insults - I have not attacked you directly as a person. I'm sure you're well intentioned. Your code works, yes, but as kdopen has pointed out, it is unusual and I do not believe useful to a new R user trying to understand selections. Looping over rows is typically very slow, especially when ifelse is already vectorised to do so. Also, ifelse is usually used to return a sequence of values as opposed to running assignments.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.