What is the best way to use regex in a for loop while testing for data type?
For context, I'm looping over large unclean data sets with multiple data types and need to find extensions of strings, if they exist. Small changes to my code, like converting values to string costs me minutes.
I read through this question Python: How to use RegEx in an if statement? but couldn't find a way of testing for a match without first converting to a string.
Values:
vals = [444444, '555555-Z01']
pattern = re.compile('[-]*[A-Z]{1}[0-9]{2}$')
# new_vals = [444444, 555555]
Slow method: (2.4 µs ± 93.6 ns per loop)
new_vals = []
for v in vals:
if type(v)==str:
if pattern.search(v) is not None:
new_v = pattern.findall(v)[0].replace('-','')
new_vals.append(new_v)
else:
new_vals.append(v)
Fast method: (1.84 µs ± 34.7 ns per loop)
f = lambda x: x if type(x)!=str else pattern.findall(x)[0].replace('-','')
new_vals = []
for v in vals:
new_vals.append(f(v))
Unsucessful Method:
new_vals = []
for v in vals:
if ((type(v)==str) & (pattern.search(v) is not None)):
new_vals.append(v)
Error:
TypeError: expected string or bytes-like object
if ((type(v)==str) and (pattern.search(v) is not None)):.&doesn't short circuittype(v)in comparisons. Useisinstance(v, str)instead.