I have a column of influenza virus names within my DataFrame. Here is a representative sampling of the name formats present:
- (A/Egypt/84/2001(H1N2))
- A/Brazil/1759/2004(H3N2)
- A/Argentina/126/2004
I am only interested in getting out A/COUNTRY/NUMBER/YEAR from the strain names, e.g. A/Brazil/1759/2004. I have tried doing:
df['Strain Name'] = df['Original Name'].str.split("(")
However, if I try accessing .str[0], then I miss out case #1. If I do .str[1], I miss out case 2 and 3.
Is there a solution that works for all three cases? Or is there some way to apply a condition in string splits, without iterating over each row in the data frame?
.apply(lambda x: max(x, key=len))did the trick, since I was basically looking for the longest string in the split.