1

I'd like to be able to figure out if I can get the following to work (Pandas 0.23.4). Any help would be most appreciated.

import numpy as np
import pandas as pd

rows = 12
rng = pd.date_range('2011-01', periods=rows, freq='M')

df = pd.DataFrame(np.arange(rows), index=rng)

print(df.loc['2011-01'])
print(df.loc[np.datetime64('2011-01')])

The first print does what I would expect: shows all the rows that are in Jan of 2011. However, the second one throws an KeyError because the value is not in the index. I was hoping that it would provide the same output, but after some testing I realize that it is looking for an exact match 2011-01-01, which is not in the DataFrame. I'd like for the second one to work, so that I can use numpy.arange or pandas.date_range to easily generate arrays of dates that I can loop through. Anyone got this to work? (Seems like this works, but only if you have an exact match for the dates.)

1
  • Thanks for the help cryptonome and jpp. Unfortunately, it seems like the answer for this particular version of Pandas is "No. You can't do this exactly." I marked jpp answer as correct, because it doesn't require another loop. Commented Oct 21, 2018 at 12:52

2 Answers 2

2

use DatetimeIndex.to_period() & Period.month

import numpy as np
import pandas as pd

rows = 12
rng = pd.date_range('2011-01', periods=rows, freq='M')

df = pd.DataFrame(np.arange(rows), index=rng)

# print(df.loc['2011-01'])
for idx, di in enumerate(df.index.to_period()):
    if di.month == np.datetime64('2011-01').item().month:
        print(f'loc: [{idx}] == {df.index[idx]}')

output:

# loc: [0] == 2011-01-31 00:00:00

Since your df indexes consist of the end of the month dates, you can use this trick to use df.loc to get the row:

>>>> df.loc[df.index == np.datetime64('2011-03', 'D') -1]
            0
2011-02-28  1

>>>> df.loc[df.index == np.datetime64('2011-04', 'D') -1]
            0
2011-03-31  2

>>>> df[df.index == np.datetime64('2011-12', 'D') -1]
             0
2011-11-30  10

# use 2012 January 1st minus one day to get 2011 Dec 31st
>>>> df[df.index == np.datetime64('2012-01', 'D') -1]
             0
2011-12-31  11
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks @cryptonome. The to_period method is interesting; I'll have to consider that. However, I was hoping for a way to do this without adding another explicit loop, if possible. The implicit looping in Numpy/Pandas is much more efficient...
since your index is always the end of the month & your np.datetime64 is in year-month format, there's a trick that you can use for that. let me edit my answer.
Thanks again, @cryptonome. Unfortunately, your new code only works for exact matches. I was hoping to do a search for the entire month. I appreciate your help, though.
that's alright @Ryan, maybe i misunderstood your question
1

You can write a function to convert np.datetime64 to Pandas-compatible strings:

def stringify(x):
    year = x.astype('datetime64[Y]').astype(int) + 1970
    month = x.astype('datetime64[M]').astype(int) % 12 + 1
    return f'{year}-{month:02}'

a = df.loc['2011-01']
b = df.loc[stringify(np.datetime64('2011-01'))]

assert a.equals(b)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.