6

Consider an input file, b.dat:

string,date,number
a string,2/5/11 9:16am,1.0
a string,3/5/11 10:44pm,2.0
a string,4/22/11 12:07pm,3.0
a string,4/22/11 12:10pm,4.0
a string,4/29/11 11:59am,1.0
a string,5/2/11 1:41pm,2.0
a string,5/2/11 2:02pm,3.0
a string,5/2/11 2:56pm,4.0
a string,5/2/11 3:00pm,5.0
a string,5/2/14 3:02pm,6.0
a string,5/2/14 3:18pm,7.0

I can group monthly totals like so:

b=pd.read_csv('b.dat')
b['date']=pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.index=b['date']
bg=pd.groupby(b,by=[b.index.year,b.index.month])
bgs=bg.sum()

The index of the grouped totals looks like:

bgs

            number
2011 2       1
     3       2
     4       8
     5      14
2014 5      13

bgs.index

MultiIndex(levels=[[2011, 2014], [2, 3, 4, 5]],
       labels=[[0, 0, 0, 0, 1], [0, 1, 2, 3, 3]])

I'd like to reformat the index into date time format (days can be first of month).

I've tried the following:

bgs.index = pd.to_datetime(bgs.index)

and

bgs.index = pd.DatetimeIndex(bgs.index)

Both fail. Does anyone know how I can do this?

4
  • I get an error if I use this code directly with Pandas 0.13. It breaks on the pd.to_datetime call, claiming that the use of %p is incorrect via KeyError: 'p' in /pandas/tslib.so in pandas.tslib.array_strptime (pandas/tslib.c:20989). Commented Jun 6, 2014 at 21:27
  • In fact, I can reproduce the pandas error with any string needing to parse the 'am' or 'pm'. There must be a bug in handling how that gets passed to strftime or whatever. Commented Jun 6, 2014 at 21:33
  • Opened a pandas issue here. Commented Jun 6, 2014 at 21:35
  • @EMS for info I'm on version 0.13.1 (it works for me) Commented Jun 6, 2014 at 21:36

2 Answers 2

5

Consider resample by 'M' rather than grouping by attributes of the DatetimeIndex:

In [11]: b.resample('M', how='sum').dropna()
Out[11]:
            number
date
2011-02-28       1
2011-03-31       2
2011-04-30       8
2011-05-31      14
2014-05-31      13

Note: you have to drop the NaN if you don't want the months in between.

Sign up to request clarification or add additional context in comments.

2 Comments

That's great thanks - I'm trying to find more info on the 'rule' parameter. How do you know that 'M' groups by month. I'd like to know what else can it do. There's possibly a search term I don't know in order to find it in the man pages?
The keyword is "offset" pandas.pydata.org/pandas-docs/stable/… :)
4

You can create a column from the index via the date calculation you want, then set that as the index:

bgs['expanded_date'] = bgs.index.map(lambda x: datetime.date(x.year, x.month, 1))
bgs.set_index('expanded_date')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.