Condition with multiple strings in columns R

Question

I have this dataframe:

a <- c(1,2,3,4, 5)
b <- c('Software Engineer', 'Data Engineer', 'HR Officer', 'Marketing Manager', 'Computer Engineer')

names(df) <- c('ID', 'Jobs')

I want to group languages in some categories, If each job description contains the "Software", "Data" or "Computer", then the category for this job is "IT", if not the category would be "OTH". The result should look like this:

 ID               Jobs  Category
  1  Software Engineer        IT
  2      Data Engineer        IT
  3         HR Officer       OTH
  4  Marketing Manager       OTH
  5  Computer Engineer        IT

In Python I can use these code df["Jobs"].str.contains("Software|Data|Computer", na = False) combines with np.select to get the Category. However I don't know how to do it in R, please give me some advice to solve this problem.

akrun · Accepted Answer · 2020-02-27 17:59:53Z

1

We can use grepl to get a logical vector by matching either the 'Software', 'Data', or 'Computer' in the 'Jobs' column, convert it to numeric index and based on that replace the values with 'OTH' or 'IT'

df$Category <- c("OTH", "IT")[(grepl("Software|Data|Computer", df$Jobs) + 1)]
df$Category
#[1] "IT"  "IT"  "OTH" "OTH" "IT"

Or use ifelse with grepl

ifelse(grepl("Software|Data|Computer", df$Jobs), "IT", "OTH")

data

df <- structure(list(ID = c(1, 2, 3, 4, 5), Jobs = structure(c(5L, 
2L, 3L, 4L, 1L), .Label = c("Computer Engineer", "Data Engineer", 
"HR Officer", "Marketing Manager", "Software Engineer"), 
class = "factor")), class = "data.frame", row.names = c(NA, 
-5L))

edited Feb 27, 2020 at 17:59

answered Feb 27, 2020 at 17:53

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Farzad Minooei · Accepted Answer · 2020-02-27 20:12:33Z

1

Here is my solution:

a <- c(1,2,3,4, 5)
b <- c('Software Engineer', 'Data Engineer', 'HR Officer', 'Marketing Manager', 'Computer Engineer')
df <- data.frame(a,b)
names(df) <- c('ID', 'Jobs')
df

  ID              Jobs
1  1 Software Engineer
2  2     Data Engineer
3  3        HR Officer
4  4 Marketing Manager
5  5 Computer Engineer

#Add Job Category

df$Category [ grep("Software|Data|Computer", df$Jobs)] <- "IT"
df$Category [is.na(df$Category)] <- "OTH"
df

  ID              Jobs Category
1  1 Software Engineer       IT
2  2     Data Engineer       IT
3  3        HR Officer      OTH
4  4 Marketing Manager      OTH
5  5 Computer Engineer       IT

answered Feb 27, 2020 at 20:12

Farzad Minooei

517 bronze badges

Collectives™ on Stack Overflow

Condition with multiple strings in columns R

2 Answers 2

data

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

data

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related