Here's a very simple recreation although the real DF has many more columns
My dataframe:
length width height age
0 1 5 8 12
1 1 5 8 12
2 1 5 8 21
3 1 5 8 15
4 1 5 8 15
5 1 6 9 12
6 2 6 9 32
7 2 6 9 32
8 2 6 7 98
9 3 4 7 12
10 3 4 7 54
11 3 4 7 21
I want to get the rows where width == 6 and age ==32.
Easy enough:
d[(d['width']==6) & (d['age']==32)]
length width height age
6 2 6 9 32
7 2 6 9 32
Is there a way to automate this even more? Let's say I have a list of columns and values. In this case it's still only two columns/values, but I'm thinking about dealing with 15 or more:
cols = ['width','age']
vals = [6,32]
Now to build an empty dataframe and update the rows with append:
df_temp = pd.DataFrame()
for col,val in zip(cols,vals):
if df_temp.empty:
df_temp = df[df[col]==val]
else:
df_temp.append(df[df[col]==val])
length width height age
5 1 6 9 12
6 2 6 9 32
7 2 6 9 32
8 2 6 7 98
What this does is equivalent to using the or symbol |:
d[(d['width']==6) | (d['age']==32)]
How can I automate this so it is AND rather than or ?
I've tried something completely outrageous but it doesn't work, it's still seems to be equivalent to | instead of &.
[d[(d[col]==val) & (d[col]==val)] for col, val in zip(cols,vals)][0]
length width height age
5 1 6 9 12
6 2 6 9 32
7 2 6 9 32
8 2 6 7 98
My reproducible dataframe:
import pandas as pd
pd.DataFrame({'length': pd.Series([1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3],dtype='int64',index=pd.RangeIndex(start=0, stop=12, step=1)), 'width': pd.Series([5, 5, 5, 5, 5, 6, 6, 6, 6, 4, 4, 4],dtype='int64',index=pd.RangeIndex(start=0, stop=12, step=1)), 'height': pd.Series([8, 8, 8, 8, 8, 9, 9, 9, 7, 7, 7, 7],dtype='int64',index=pd.RangeIndex(start=0, stop=12, step=1)), 'age': pd.Series([12, 12, 21, 15, 15, 12, 32, 32, 98, 12, 54, 21],dtype='int64',index=pd.RangeIndex(start=0, stop=12, step=1))}, index=pd.RangeIndex(start=0, stop=12, step=1))