0

in a data.table I have a column with company names that sometimes include the city of that company. Based on a vector of all existing cities I would like to detect if a city name is part of the company name and if yes extract the city into a new column. I used a for loop that loops trough every row of my data.table over all cities within my vector of cities in R. This takes a very long time. Is there a way I can vectorize this operation to make it more efficient computationally.

Company_name Location
Company 1 Berlin Gmbh. NA
Dresden Company 2 Gmbh. NA
Company 3 in Hamburg NA
Company 4 Ldt NA
Company_name Location
Company 1 Berlin Gmbh. Berlin
Dresden Company 2 Gmbh. Dresden
Company 3 in Hamburg Hamburg
Company 4 Ldt NA
1
  • Greetings! Instead of sharing a table, it would make things easier if people could easily work with the data you have. Please provide a reproducible dataset by providing the dput or subset of your data. Here is a guide for doing so: youtu.be/3EID3P1oisg Commented Feb 27, 2022 at 1:13

1 Answer 1

2
df[, city:=stringr::str_extract(Company, paste0(cities,collapse = "|"))]

OR

# this also works
df[, city:=cities[sapply(cities, \(x) grepl(x,Company))], by=1:nrow(df)]

Output:

                   Company    city
1:  Company 1 Berlin Gmbh.  Berlin
2: Dresden Company 2 Gmbh. Dresden
3:    Company 3 in Hamburg Hamburg
4:           Company 4 Ldt    <NA>

Input:

library(data.table)
df =data.table(
  Company = c(
  "Company 1 Berlin Gmbh.", 
  "Dresden Company 2 Gmbh.",
  "Company 3 in Hamburg",
  "Company 4 Ldt")
)
cities = c('Berlin','Dresden','Hamburg')
Sign up to request clarification or add additional context in comments.

1 Comment

Great answer! I posted a related question if you are interested: stackoverflow.com/questions/74161558/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.