1

I have this dataframe:

a <- c(1,2,3,4, 5)
b <- c('Software Engineer', 'Data Engineer', 'HR Officer', 'Marketing Manager', 'Computer Engineer')

names(df) <- c('ID', 'Jobs')

I want to group languages in some categories, If each job description contains the "Software", "Data" or "Computer", then the category for this job is "IT", if not the category would be "OTH". The result should look like this:

 ID               Jobs  Category
  1  Software Engineer        IT
  2      Data Engineer        IT
  3         HR Officer       OTH
  4  Marketing Manager       OTH
  5  Computer Engineer        IT

In Python I can use these code df["Jobs"].str.contains("Software|Data|Computer", na = False) combines with np.select to get the Category. However I don't know how to do it in R, please give me some advice to solve this problem.

2 Answers 2

1

We can use grepl to get a logical vector by matching either the 'Software', 'Data', or 'Computer' in the 'Jobs' column, convert it to numeric index and based on that replace the values with 'OTH' or 'IT'

df$Category <- c("OTH", "IT")[(grepl("Software|Data|Computer", df$Jobs) + 1)]
df$Category
#[1] "IT"  "IT"  "OTH" "OTH" "IT"

Or use ifelse with grepl

ifelse(grepl("Software|Data|Computer", df$Jobs), "IT", "OTH")

data

df <- structure(list(ID = c(1, 2, 3, 4, 5), Jobs = structure(c(5L, 
2L, 3L, 4L, 1L), .Label = c("Computer Engineer", "Data Engineer", 
"HR Officer", "Marketing Manager", "Software Engineer"), 
class = "factor")), class = "data.frame", row.names = c(NA, 
-5L))
Sign up to request clarification or add additional context in comments.

Comments

1

Here is my solution:

a <- c(1,2,3,4, 5)
b <- c('Software Engineer', 'Data Engineer', 'HR Officer', 'Marketing Manager', 'Computer Engineer')
df <- data.frame(a,b)
names(df) <- c('ID', 'Jobs')
df

  ID              Jobs
1  1 Software Engineer
2  2     Data Engineer
3  3        HR Officer
4  4 Marketing Manager
5  5 Computer Engineer

#Add Job Category

df$Category [ grep("Software|Data|Computer", df$Jobs)] <- "IT"
df$Category [is.na(df$Category)] <- "OTH"
df

  ID              Jobs Category
1  1 Software Engineer       IT
2  2     Data Engineer       IT
3  3        HR Officer      OTH
4  4 Marketing Manager      OTH
5  5 Computer Engineer       IT

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.