I have the following df and function (see below). I might be over complicating this. A new set of fresh eyes would be deeply appreciated.
df:
Site Name Plan Unique ID Atlas Placement ID
Affectv we11080301 11087207850894
Mashable we14880202 11087208009031
Alphr uk10790301 11087208005229
Alphr uk19350201 11087208005228
The goal is to:
Iter first through
df['Plan Unique ID'], search for a specific value (we_matchoruk_match), if there is a matchCheck that the string value is bigger than a certain value in that group (
we12720203oruk11350200)If the value is greater than add that
we or uk valueto a new columndf['Consolidated ID'].If the value is lower or there is no match, then search
df['Atlas Placement ID']withnew_id_searchIf there is a match, then add that to
df['Consolidated ID']If not, return 0 to
df['Consolidated ID]
The current problem is that it returns an empty column.
def placement_extract(df="mediaplan_df", we_search="we\d{8}", uk_search="uk\d{8}", new_id_search= "(\d{14})"):
if type(df['Plan Unique ID']) is str:
we_match = re.search(we_search, df['Plan Unique ID'])
if we_match:
if we_match > "we12720203":
return we_match.group(0)
else:
uk_match = re.search(uk_search, df['Plan Unique ID'])
if uk_match:
if uk_match > "uk11350200":
return uk_match.group(0)
else:
match_new = re.search(new_id_search, df['Atlas Placement ID'])
if match_new:
return match_new.group(0)
return 0
mediaplan_df['Consolidated ID'] = mediaplan_df.apply(placement_extract, axis=1)
Edit: Cleaned the formula
I modified gzl's function in the following way (see below): First see if in df1 there is 14 numbers. If so, add that.
The next step, ideally would be to grab a column MediaPlanUnique from df2 and turn it into a series filtered_placements:
we11080301
we12880304
we14880202
uk19350201
uk11560205
uk11560305
And see if any of the values in filtered_placements are present in df['Plan Unique ID]. If there is a match, then add df['Plan Unique ID] to our end column = df[ConsolidatedID]
The current problem is that it results in all 0. I think it's because the comparison is been done as 1 to 1 (first result of new_match vs first result of filtered_placements) rather than 1 to many (first result of new_match vs all results of filtered_placements)
Any ideas?
def placement_extract(df="mediaplan_df", new_id_search="[a-zA-Z]{2}\d{8}", old_id_search= "(\d{14})"):
if type(df['PlacementID']) is str:
old_match = re.search(old_id_search, df['PlacementID'])
if old_match:
return old_match.group(0)
else:
if type(df['Plan Unique ID']) is str:
if type(filtered_placements) is str:
new_match = re.search(new_id_search, df['Plan Unique ID'])
if new_match:
if filtered_placements.str.contains(new_match.group(0)):
return new_match.group(0)
return 0
mediaplan_df['ConsolidatedID'] = mediaplan_df.apply(placement_extract, axis=1)