4

Suppose I have a simple dataframe where I have four features as food, kitchen, city, and detail.

d = {'Food': ['P1|0', 'P2', 'P3|45', 'P1', 'P2', 'P4', 'P1|1', 'P3|7', 'P5', 'P1||23'], 
     'Kitchen' : ['L1', 'L2','L9', 'L4','L5', 'L6','L1', 'L9','L10', 'L1'],
     'City': ['A', 'A', 'A', 'B', 'B','B', 'C', 'C', 'C','D'],
     'Detail': ['d1', 'd2', 'd3', 'd4', 'd5', 'd6', 'd7', 'd8', 'd9','d0']}
df = pd.DataFrame(data=d)

My goal is to use the substring of Food value without | and create a new dataframe where I can see which kitchens do produce similar foods. The way I define similarity is that substring should match with respect to Kitchen.

df['Food'] = df['Food'].apply(str)

df.insert(0,'subFood',df['Food'].str.split('|').str[0])
df.iloc[: , :2]
subFood Food
0   P1  P1|0    
1   P2  P2  
2   P3  P3|45   
3   P1  P1  
4   P2  P2  
5   P4  P4  
6   P1  P1|1    
7   P3  P3|7    
8   P5  P5  
9   P1  P1||23  

To do so, I use merge function together with query.

df.merge(df, on=['subFood', 'Kitchen'], suffixes=('_1', '_2')).query('City_1 != City_2')

subFood Food_1  Kitchen City_1  Detail_1    Food_2  City_2  Detail_2
1   P1  P1|0    L1  A   d1  P1|1    C   d7
2   P1  P1|0    L1  A   d1  P1||23  D   d0
3   P1  P1|1    L1  C   d7  P1|0    A   d1
5   P1  P1|1    L1  C   d7  P1||23  D   d0
6   P1  P1||23  L1  D   d0  P1|0    A   d1
7   P1  P1||23  L1  D   d0  P1|1    C   d7
11  P3  P3|45   L9  A   d3  P3|7    C   d8
12  P3  P3|7    L9  C   d8  P3|45   A   d3

I got stuck here. My intention is to have a dataframe that should look similar to the dataframe shown below. I appreciate any help and / or hint.

subFood Food_1  Food_2 Kitchen City Detail
P1       P1|0    P1|0    L1       A   d1
P1       P1|0    P1|1    L1       C   d1  
....

1 Answer 1

1

IIUC, you can split each row into two rows by combining the city names to a list and then using explode:

merged = df.merge(df, on=["subFood","Kitchen"], suffixes=("_1","_2")).query("City_1 != City_2")
merged["City"] = merged[["City_1","City_2"]].to_numpy().tolist()
output = merged.drop(["City_1","City_2","Detail_2"],axis=1).explode("City").rename(columns={"Detail_1":"Detail"})

>>> output
   subFood  Food_1 Kitchen Detail  Food_2 City
1       P1    P1|0      L1     d1    P1|1    A
1       P1    P1|0      L1     d1    P1|1    C
2       P1    P1|0      L1     d1  P1||23    A
2       P1    P1|0      L1     d1  P1||23    D
3       P1    P1|1      L1     d7    P1|0    C
3       P1    P1|1      L1     d7    P1|0    A
5       P1    P1|1      L1     d7  P1||23    C
5       P1    P1|1      L1     d7  P1||23    D
6       P1  P1||23      L1     d0    P1|0    D
6       P1  P1||23      L1     d0    P1|0    A
7       P1  P1||23      L1     d0    P1|1    D
7       P1  P1||23      L1     d0    P1|1    C
11      P3   P3|45      L9     d3    P3|7    A
11      P3   P3|45      L9     d3    P3|7    C
12      P3    P3|7      L9     d8   P3|45    C
12      P3    P3|7      L9     d8   P3|45    A
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.