3

I have a Pandas Dataframe that stores a food item on each row in the following format -

Id   Calories   Protein   IsBreakfast   IsLunch   IsDinner
1      300        6           0           1          0    
2      400        12          1           1          0
.
.
.   
100    700        25          0           1          1            

I want to print all three-row combinations with the following conditions -

  1. The combinations should contain at least one of the breakfast, lunch, and dinner.
  2. sum of calories should be between certain range (say minCal < sum of calories in three rows < maxCal)
  3. similar condition for proteins too.

Right now, I am first iterating over all breakfast items, choosing lunch items. Then iterating over all dinner items. After selecting a combination, I am adding relevant columns and checking if values are within the desired range

1
  • 2
    What do you mean when you say three rows? Can you give an example of the desired output? Commented Aug 27, 2019 at 16:40

2 Answers 2

1

You can use the approach described in this answer to generate a new DataFrame containing all the combinations of three rows from your original data:

from itertools import combinations
import pandas as pd

# Using skbrhmn's df
df = pd.DataFrame({"Calories": [100, 200, 300, 400, 500],
                   "Protein": [10, 20, 30, 40, 50],
                   "IsBreakfast": [1, 1, 0, 0, 0],
                   "IsLunch": [1, 0, 0, 0, 1],
                   "IsDinner": [1, 1, 1, 0, 1]})

comb_rows = list(combinations(df.index, 3))
comb_rows

Output:

[(0, 1, 2),
 (0, 1, 3),
 (0, 1, 4),
 (0, 2, 3),
 (0, 2, 4),
 (0, 3, 4),
 (1, 2, 3),
 (1, 2, 4),
 (1, 3, 4),
 (2, 3, 4)]

Then create a new DataFrame containing the sum of all numeric fields in your original frame, over all the possible combinations of three rows:

combinations = pd.DataFrame([df.loc[c,:].sum() for c in comb_rows], index=comb_rows)

print(combinations)

           Calories  Protein  IsBreakfast  IsLunch  IsDinner
(0, 1, 2)       600       60            2        1         3
(0, 1, 3)       700       70            2        1         2
(0, 1, 4)       800       80            2        2         3
(0, 2, 3)       800       80            1        1         2
(0, 2, 4)       900       90            1        2         3
(0, 3, 4)      1000      100            1        2         2
(1, 2, 3)       900       90            1        0         2
(1, 2, 4)      1000      100            1        1         3
(1, 3, 4)      1100      110            1        1         2
(2, 3, 4)      1200      120            0        1         2

Finally you can apply any filters you need:

filtered = combinations[
    (combinations.IsBreakfast>0) &
    (combinations.IsLunch>0) &
    (combinations.IsDinner>0) &
    (combinations.Calories>600) &
    (combinations.Calories<1000) &
    (combinations.Protein>=80) &
    (combinations.Protein<120)
]
print(filtered)

           Calories  Protein  IsBreakfast  IsLunch  IsDinner
(0, 1, 4)       800       80            2        2         3
(0, 2, 3)       800       80            1        1         2
(0, 2, 4)       900       90            1        2         3
Sign up to request clarification or add additional context in comments.

Comments

0

You can add combinations of filters to a dataframe using the | and & operators. Creating a dummy dataframe for example:

df1 = pd.DataFrame({"Calories": [100, 200, 300, 400, 500],
                    "Protein": [10, 20, 30, 40, 50],
                    "IsBreakfast": [1, 1, 0, 0, 0],
                    "IsLunch": [1, 0, 0, 0, 1],
                    "IsDinner": [1, 1, 1, 0, 1]})
print(df1)

Output:

   Calories  Protein  IsBreakfast  IsLunch  IsDinner
0       100       10            1        1         1
1       200       20            1        0         1
2       300       30            0        0         1
3       400       40            0        0         0
4       500       50            0        1         1

Now add all the conditions:

min_cal = 100
max_cal = 600
min_prot = 10
max_prot = 40
df_filtered = df1[
    ((df1['IsBreakfast']==1) | (df1['IsLunch']==1) | (df1['IsDinner']==1)) &
    ((df1['Calories'] > min_cal) & (df1['Calories'] < max_cal)) &
    ((df1['Protein'] > min_prot) & (df1['Protein'] < max_prot))
]

print(df_filtered)

Output:

   Calories  Protein  IsBreakfast  IsLunch  IsDinner
1       200       20            1        0         1
2       300       30            0        0         1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.