Suppose I have a simple dataframe where I have four features as food, kitchen, city, and detail.
d = {'Food': ['P1|0', 'P2', 'P3|45', 'P1', 'P2', 'P4', 'P1|1', 'P3|7', 'P5', 'P1||23'],
'Kitchen' : ['L1', 'L2','L9', 'L4','L5', 'L6','L1', 'L9','L10', 'L1'],
'City': ['A', 'A', 'A', 'B', 'B','B', 'C', 'C', 'C','D'],
'Detail': ['d1', 'd2', 'd3', 'd4', 'd5', 'd6', 'd7', 'd8', 'd9','d0']}
df = pd.DataFrame(data=d)
My goal is to use the substring of Food value without | and create a new dataframe where I can see which kitchens do produce similar foods. The way I define similarity is that substring should match with respect to Kitchen.
df['Food'] = df['Food'].apply(str)
df.insert(0,'subFood',df['Food'].str.split('|').str[0])
df.iloc[: , :2]
subFood Food
0 P1 P1|0
1 P2 P2
2 P3 P3|45
3 P1 P1
4 P2 P2
5 P4 P4
6 P1 P1|1
7 P3 P3|7
8 P5 P5
9 P1 P1||23
To do so, I use merge function together with query.
df.merge(df, on=['subFood', 'Kitchen'], suffixes=('_1', '_2')).query('City_1 != City_2')
subFood Food_1 Kitchen City_1 Detail_1 Food_2 City_2 Detail_2
1 P1 P1|0 L1 A d1 P1|1 C d7
2 P1 P1|0 L1 A d1 P1||23 D d0
3 P1 P1|1 L1 C d7 P1|0 A d1
5 P1 P1|1 L1 C d7 P1||23 D d0
6 P1 P1||23 L1 D d0 P1|0 A d1
7 P1 P1||23 L1 D d0 P1|1 C d7
11 P3 P3|45 L9 A d3 P3|7 C d8
12 P3 P3|7 L9 C d8 P3|45 A d3
I got stuck here. My intention is to have a dataframe that should look similar to the dataframe shown below. I appreciate any help and / or hint.
subFood Food_1 Food_2 Kitchen City Detail
P1 P1|0 P1|0 L1 A d1
P1 P1|0 P1|1 L1 C d1
....