Improving computation speed of a nested loop in R

Question

I have this loop that I tried to improve as much as possible, sadly, I don't know how to make it better.

Do you have any idea of improvement?

#partial is a data frame that look like this
partial = data.frame(
partial.regex = c("european construction industry federation",
" zentralverband des deutschen baugewerbes",
"hauptverband der deutschen bauindustrie",
"1 1 drillisch ag 439568220616 04"))

> summary(partial)
 partial.name          regex            full.name        
 Length:13202       Length:13202       Length:13202      
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character

#full is also a df
full = data.frame(
full.name = c("International Lead Association (ILA)", "Airborne Wind Europe", "Sazka Group a.s",
regex = c("international lead association (ila)", "airborne wind europe", "sazka group a.s.")
> summary(full)
  full.name            regex          
 Length:9779        Length:9779       
 Class :character   Class :character  
 Mode  :character   Mode  :character

And here comes the loop. Sorry if this is dumb I am a real beginner !


for(y in 1:dim(partial)[1]){
  a = 0
  b = ""
  for(i in 1:dim(full)[1]){
    
    vec = c(partial$regex[y], full$regex[i])

  

if(length(Reduce(`intersect`,stri_extract_all_regex(vec,"\\w+"))) > a){
      a = length(Reduce(`intersect`,stri_extract_all_regex(vec,"\\w+")))
      partial$full[y] = full$full.name[i]
    }
    
  }
}

Thank you in advance for all the help you can give me!

Best regards,

PS : partial.csv = https://github.com/JMcrocs/MeetingMEPs/blob/main/partial.csv

full.csv = https://github.com/JMcrocs/MeetingMEPs/blob/main/full.csv

ThomasIsCoding · Accepted Answer · 2021-05-09 22:22:52Z

1

Here is a base R option using ´outer+max.col`

partial$full <- full$full.name[
  max.col(
    lengths(outer(
      regmatches(partial$partial.regex, gregexpr("\\w+", partial$partial.regex)),
      regmatches(full$regex, gregexpr("\\w+", full$regex)),
      FUN = Vectorize(intersect)
    )),
    ties.method = "first"
  )
]

answered May 9, 2021 at 22:22

ThomasIsCoding

106k9 gold badges38 silver badges110 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

JMCrocs Over a year ago

Thank you very much! This is 3x faster! I still have a lot to learn

Collectives™ on Stack Overflow

Improving computation speed of a nested loop in R

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related