Say we have this df:
import pandas as pd
df = pd.DataFrame({'a': ['hair color other family, friends ', 'family, friends hair color']})
a
0 hair color other family, friends
1 family, friends hair color
I want to extract strings using my own list of items:
items = ['hair color', 'other', 'family, friends']
I want to do this because there are no consistent delimiter or pattern in the raw data.
Desired output:
import numpy as np
desired_output = pd.DataFrame({'a': ['hair color other family, friends ', 'family, friends hair color'],
'hair color': ['hair color', 'hair color'],
'other': ['other', np.nan],
'family, friends': ['family, friends', 'family, friends']
})
a hair color other family, friends
0 hair color other family, friends hair color other family, friends
1 family, friends hair color hair color NaN family, friends