I have one table that contains strings
a = pd.DataFrame({"strings_to_search" : ["AA1 BB2 CVC GF2","AR1 KP1","PL3 4OR 91K GZ3"]})
and one with search parameters as regular expressions
re = pd.DataFrame({"regex_search" : ["^(?=.*AA1).*$", "^(?=.*AR1)(?=.*PL3).*$", "^(?=.*4OR)(?=.*GZ3).*$"]})
My goal is to match the string to the search parameter if it is part of the string. I want to compare each string to each pattern and join the string-pattern that match like this:
| AA1 BB2 CVC GF2 | ^(?=.*AA1).*$
| PL3 4OR 91K GZ3 | ^(?=.*4OR)(?=.*GZ3).*$
Is there any way do do this in pandas? I have implemented something similar in sparkSQL using the rlike function but spark does not do too well when joining large tables.
Since pandas does not have an rlike function my approach was to do a crossjoin of both tables and then compare the columns.
a["key"] = 0
re["key"] = 0
res = a.merge(re, on="key")
But how do I search column strings_to_search with the regex in column regex_search?