4

Here's a very simple recreation although the real DF has many more columns

My dataframe:

    length  width  height  age
0        1      5       8   12
1        1      5       8   12
2        1      5       8   21
3        1      5       8   15
4        1      5       8   15
5        1      6       9   12
6        2      6       9   32
7        2      6       9   32
8        2      6       7   98
9        3      4       7   12
10       3      4       7   54
11       3      4       7   21

I want to get the rows where width == 6 and age ==32.

Easy enough:

d[(d['width']==6) & (d['age']==32)]

   length  width  height  age
6       2      6       9   32
7       2      6       9   32

Is there a way to automate this even more? Let's say I have a list of columns and values. In this case it's still only two columns/values, but I'm thinking about dealing with 15 or more:

cols = ['width','age']
vals = [6,32]

Now to build an empty dataframe and update the rows with append:

df_temp = pd.DataFrame()

for col,val in zip(cols,vals):


    if df_temp.empty:

        df_temp = df[df[col]==val]

    else:

        df_temp.append(df[df[col]==val])


   length  width  height  age
5       1      6       9   12
6       2      6       9   32
7       2      6       9   32
8       2      6       7   98

What this does is equivalent to using the or symbol |:

d[(d['width']==6) | (d['age']==32)]

How can I automate this so it is AND rather than or ?

I've tried something completely outrageous but it doesn't work, it's still seems to be equivalent to | instead of &.

[d[(d[col]==val) & (d[col]==val)] for col, val in zip(cols,vals)][0]

   length  width  height  age
5       1      6       9   12
6       2      6       9   32
7       2      6       9   32
8       2      6       7   98

My reproducible dataframe:

import pandas as pd

pd.DataFrame({'length': pd.Series([1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3],dtype='int64',index=pd.RangeIndex(start=0, stop=12, step=1)), 'width': pd.Series([5, 5, 5, 5, 5, 6, 6, 6, 6, 4, 4, 4],dtype='int64',index=pd.RangeIndex(start=0, stop=12, step=1)), 'height': pd.Series([8, 8, 8, 8, 8, 9, 9, 9, 7, 7, 7, 7],dtype='int64',index=pd.RangeIndex(start=0, stop=12, step=1)), 'age': pd.Series([12, 12, 21, 15, 15, 12, 32, 32, 98, 12, 54, 21],dtype='int64',index=pd.RangeIndex(start=0, stop=12, step=1))}, index=pd.RangeIndex(start=0, stop=12, step=1))

2 Answers 2

3

Here is a way using assign with df.eq using loc and all;

df[df.eq(df.assign(**dict(zip(cols,vals)))).loc[:,cols].all(1)]

   length  width  height  age
6       2      6       9   32
7       2      6       9   32
Sign up to request clarification or add additional context in comments.

Comments

2

We can simplify this by working with the underlying numpy arrays here:

df[(df[cols].values == vals).all(1)] 

     length  width  height  age
6       2      6       9   32
7       2      6       9   32

6 Comments

Thank you. I've read the all() documentation but still not sure what is happening here. What exactly is it doing? It's checking for nan values? And returns false if a value along the rows is nan ?
all will return a true if all values from the same iterable are True, otherwise False. Here it is doing so along the first axis @scool Here are the docs
It works for my test dataset above, but unfortunately it does not work in the "real world". It is returning an empty dataframe.
I just removed the .values and now it's working. df[(df[cols] == vals).all(1)]
Well we need to know what the "real world" is to get it to work :) Glad its working now @scool
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.