Get rows from dataframe based on many columns and values

Question

Here's a very simple recreation although the real DF has many more columns

My dataframe:

    length  width  height  age
0        1      5       8   12
1        1      5       8   12
2        1      5       8   21
3        1      5       8   15
4        1      5       8   15
5        1      6       9   12
6        2      6       9   32
7        2      6       9   32
8        2      6       7   98
9        3      4       7   12
10       3      4       7   54
11       3      4       7   21

I want to get the rows where width == 6 and age ==32.

Easy enough:

d[(d['width']==6) & (d['age']==32)]

   length  width  height  age
6       2      6       9   32
7       2      6       9   32

Is there a way to automate this even more? Let's say I have a list of columns and values. In this case it's still only two columns/values, but I'm thinking about dealing with 15 or more:

cols = ['width','age']
vals = [6,32]

Now to build an empty dataframe and update the rows with append:

df_temp = pd.DataFrame()

for col,val in zip(cols,vals):


    if df_temp.empty:

        df_temp = df[df[col]==val]

    else:

        df_temp.append(df[df[col]==val])


   length  width  height  age
5       1      6       9   12
6       2      6       9   32
7       2      6       9   32
8       2      6       7   98

What this does is equivalent to using the or symbol |:

d[(d['width']==6) | (d['age']==32)]

How can I automate this so it is AND rather than or ?

I've tried something completely outrageous but it doesn't work, it's still seems to be equivalent to | instead of &.

[d[(d[col]==val) & (d[col]==val)] for col, val in zip(cols,vals)][0]

   length  width  height  age
5       1      6       9   12
6       2      6       9   32
7       2      6       9   32
8       2      6       7   98

My reproducible dataframe:

import pandas as pd

pd.DataFrame({'length': pd.Series([1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3],dtype='int64',index=pd.RangeIndex(start=0, stop=12, step=1)), 'width': pd.Series([5, 5, 5, 5, 5, 6, 6, 6, 6, 4, 4, 4],dtype='int64',index=pd.RangeIndex(start=0, stop=12, step=1)), 'height': pd.Series([8, 8, 8, 8, 8, 9, 9, 9, 7, 7, 7, 7],dtype='int64',index=pd.RangeIndex(start=0, stop=12, step=1)), 'age': pd.Series([12, 12, 21, 15, 15, 12, 32, 32, 98, 12, 54, 21],dtype='int64',index=pd.RangeIndex(start=0, stop=12, step=1))}, index=pd.RangeIndex(start=0, stop=12, step=1))

anky · Accepted Answer · 2020-01-22 16:03:27Z

3

Here is a way using assign with df.eq using loc and all;

df[df.eq(df.assign(**dict(zip(cols,vals)))).loc[:,cols].all(1)]

   length  width  height  age
6       2      6       9   32
7       2      6       9   32

answered Jan 22, 2020 at 16:03

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

yatu · Accepted Answer · 2020-01-22 16:15:26Z

2

We can simplify this by working with the underlying numpy arrays here:

df[(df[cols].values == vals).all(1)] 

     length  width  height  age
6       2      6       9   32
7       2      6       9   32

edited Jan 22, 2020 at 16:15

answered Jan 22, 2020 at 16:04

yatu

88.7k12 gold badges93 silver badges148 bronze badges

6 Comments

SCool Over a year ago

Thank you. I've read the all() documentation but still not sure what is happening here. What exactly is it doing? It's checking for nan values? And returns false if a value along the rows is nan ?

yatu Over a year ago

all will return a true if all values from the same iterable are True, otherwise False. Here it is doing so along the first axis @scool Here are the docs

SCool Over a year ago

It works for my test dataset above, but unfortunately it does not work in the "real world". It is returning an empty dataframe.

SCool Over a year ago

I just removed the .values and now it's working. df[(df[cols] == vals).all(1)]

yatu Over a year ago

Well we need to know what the "real world" is to get it to work :) Glad its working now @scool

|

Collectives™ on Stack Overflow

Get rows from dataframe based on many columns and values

2 Answers 2

Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related