I have a very large string of key value pairs (old_string) that is formatted as so:
"visitorid"="gh43k9sk-gj49-92ks-jgjs-j2ks-j29slgj952ks", "customer_name"="larry", "customer_state"="alabama",..."visitorid"="..."
this string is very large since it can be up to 30k customers. I am using this to write a file to upload to an online segmentation tool that requires that it is formatted this way with one modification -- the primary key (visitorid) needs to be tab separated and not in quotes. The end result needs to look like this (note the 4 spaces is a tab):
gh43k9sk-gj49-92ks-jgjs-j2ks-j29slgj952ks "customer_name"="larry", "customer_state"="alabama",...ABC3k9sk-gj49-92ks-dgjs-j2ks-j29slgj9bbbb
I wrote the following function that does this fine, but ive noticed that this portion of the script runs the slowest (I am assuming because regex is generally slow).
def getGUIDS(old_string):
'''
Finds guids in the string and formats it as PK for syncfile
@param old_string the string created from the nested dict
@return old_string_fmt the formatted version
'''
print ('getting ids')
ids = re.findall('("\w{8}-\w{4}-\w{4}-\w{4}-\w{12}",)', cat_string) #looks for GUID based on regex
for element in ids:
new_str = str(element.strip('"').strip('"').strip(",").strip('"') + ('\t'))
old_string_fmt = old_string.replace(element, new_str)
return old_string_fmt
Is there a way this can be done without regex that might speed this up?