1

I have a factor variable with countries. I have to use ! and %in% operators so that I can keep the "United States", "Switzerland", "United Kingdom" and transform the rest to "Others". But the code I am using is not working

country <- c(rep(x = "United States", 466), rep(x = "United Kingdom", 250), rep(x = "Switzerland", 520), 
             rep(x = "France", 97), rep(x = "Italy", 85), rep(x = "Germant", 39), rep(x = "Canada", 25), 
             rep(x = "Singapore", 2), rep(x = "South Africa", 9))
country

bulk <- c("United States", "Switzerland", "United Kingdom")
if(! bulk %in% country) country <- "Others"

I am expecting it to make four categories. United States, Switzerland, United Kingdom, Others. But I don't want the solution out of context of "!" and "%in%" operators.

4
  • 1
    "But I don't want the solution out of context of "!" and "%in%" operators" I don't understand what you mean here. What context? Are you looking for a solution that only uses ! and/or %in%? Commented Feb 18, 2019 at 6:02
  • 1
    You probably should be using ifelse. Commented Feb 18, 2019 at 6:02
  • Are you looking for country[country %in% bulk] <- "Others"? Commented Feb 18, 2019 at 6:04
  • 1
    No. it is the reciprocal of it. I want United States, United Kingdom and Switzerland as it is and the rest of the countries as Others. Commented Feb 18, 2019 at 6:20

3 Answers 3

1

Solution for a vector:

country[!(country %in% bulk)] <- "Others"

Solution for a data frame:

df<-data.frame(country=country, emptycolumn=NA)
df$country<-as.character(df$country)
df$country[!(df$country %in% bulk)]<-"Others"
View(df)
Sign up to request clarification or add additional context in comments.

5 Comments

It is giving the following warning and makes the Other countries as NA. Warning message: In [<-.factor(*tmp*, !(data$country %in% bulk), value = c(10L, : invalid factor level, NA generated
My actual variable is in a dataframe.
You should give fully reproducible examples to avoid this kind of unrelated issues. I have added an example for a data frame, maybe edit your question so readers don't get lost. This is due to the fact that your character is converted to a factor when added to a data frame.
I suggest you simply add this line of code to your question: df<-data.frame(country=country, emptycolumn=NA)
Yes that is the case. My character vector has been changed to factor. Is there a solution now?
1

Try

country[ ! country %in% bulk ] <- "Other"
table(country)
#-------------------------
country
         Other    Switzerland United Kingdom  United States 
           257            520            250            466 

R accepts logical indices for conditional assignments.

Comments

0
country <- as.data.frame(c(rep(x = "United States", 466), rep(x = "United Kingdom", 250), rep(x = "Switzerland", 520), 
             rep(x = "France", 97), rep(x = "Italy", 85), rep(x = "Germant", 39), rep(x = "Canada", 25), 
             rep(x = "Singapore", 2), rep(x = "South Africa", 9)), stringsAsFactors = F)

colnames(country) <- "country"

bulk <- c("United States", "Switzerland", "United Kingdom")

country$country[!country$country %in% bulk] <- "Other"

unique(country)

            country
1     United States
467  United Kingdom
717     Switzerland
1237          Other

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.