4

I'm filtering my rows from dataframe named joined on multiple criterias:

joined[(joined.gvkey_x==gvkey) & (year-3<=joined.year_y<=year )]

I want all the rows from the dataframe which satisfy the given conditions. I'm returned this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

I was reading this answer which states that I should use .any()/ .all() but this isnt exactly clear.

Can someone explain to me how I should apply multiple conditions and what is exactly the purpose and use of a.any() or a.all() in my operation (&).

2
  • 1
    You need (joined.year_y >= year-3) & (joined.year_y <= year) Commented Dec 28, 2016 at 16:07
  • Can you show us the head of your dataframe? Commented Dec 28, 2016 at 16:10

2 Answers 2

2

You could use query

joined.query('gvkey_x == @gvkey & @year - 3 <= year_y <= @year')

demo

year = 2002
gvkey = 12345

joined = pd.DataFrame(dict(
        gvkey_x=[12343] * 6 + [12345] * 6 + [12346] * 6,
        year_y=list(range(2000, 2006)) * 3
    ))
print(joined)

    gvkey_x  year_y
0     12343    2000
1     12343    2001
2     12343    2002
3     12343    2003
4     12343    2004
5     12343    2005
6     12345    2000
7     12345    2001
8     12345    2002
9     12345    2003
10    12345    2004
11    12345    2005
12    12346    2000
13    12346    2001
14    12346    2002
15    12346    2003
16    12346    2004
17    12346    2005

print(joined.query('gvkey_x == @gvkey & @year - 3 <= year_y <= @year'))

   gvkey_x  year_y
6    12345    2000
7    12345    2001
8    12345    2002

The reason you were getting your error is because of (year-3<=joined.year_y<=year )
pandas needs to evaluate each comparison separately.

As @Psidom suggested in the comments:

(joined.year_y >= year-3) & (joined.year_y <= year)

Would fix it.

Sign up to request clarification or add additional context in comments.

Comments

1

You can try numpy.where (using sample data):

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({"year": [2013, 2014, 2014, 2015, 2013, 2013], "class": ['a', 'b', 'b', 'b', 'b', 'a'], "value": [13, 25, 32, 12, 23, 54]})
>>> df
  class  value  year
0     a     13  2013
1     b     25  2014
2     b     32  2014
3     b     12  2015
4     b     23  2013
5     a     54  2013
>>> df.ix[np.where((df['class'] == 'a') & (df['year'] == 2013))[0]]
  class  value  year
0     a     13  2013
5     a     54  2013

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.