0

I have this loop that I tried to improve as much as possible, sadly, I don't know how to make it better.

Do you have any idea of improvement?

#partial is a data frame that look like this
partial = data.frame(
partial.regex = c("european construction industry federation",
" zentralverband des deutschen baugewerbes",
"hauptverband der deutschen bauindustrie",
"1 1 drillisch ag 439568220616 04"))

> summary(partial)
 partial.name          regex            full.name        
 Length:13202       Length:13202       Length:13202      
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character

#full is also a df
full = data.frame(
full.name = c("International Lead Association (ILA)", "Airborne Wind Europe", "Sazka Group a.s",
regex = c("international lead association (ila)", "airborne wind europe", "sazka group a.s.")
> summary(full)
  full.name            regex          
 Length:9779        Length:9779       
 Class :character   Class :character  
 Mode  :character   Mode  :character  

And here comes the loop. Sorry if this is dumb I am a real beginner !


for(y in 1:dim(partial)[1]){
  a = 0
  b = ""
  for(i in 1:dim(full)[1]){
    
    vec = c(partial$regex[y], full$regex[i])

  

if(length(Reduce(`intersect`,stri_extract_all_regex(vec,"\\w+"))) > a){
      a = length(Reduce(`intersect`,stri_extract_all_regex(vec,"\\w+")))
      partial$full[y] = full$full.name[i]
    }
    
  }
}

Thank you in advance for all the help you can give me!

Best regards,

PS : partial.csv = https://github.com/JMcrocs/MeetingMEPs/blob/main/partial.csv

full.csv = https://github.com/JMcrocs/MeetingMEPs/blob/main/full.csv

1 Answer 1

1

Here is a base R option using ´outer+max.col`

partial$full <- full$full.name[
  max.col(
    lengths(outer(
      regmatches(partial$partial.regex, gregexpr("\\w+", partial$partial.regex)),
      regmatches(full$regex, gregexpr("\\w+", full$regex)),
      FUN = Vectorize(intersect)
    )),
    ties.method = "first"
  )
]
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much! This is 3x faster! I still have a lot to learn

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.