I have a large, chronologically ordered array of datetime.date objects. Many of the dates in this array are the same, however some dates are missing... (it's a time series of 'real data', so it's messy).
I want to count how many data points there are for each date, currently I do it like this:
import datetime as dt
import numpy as np
t = np.array([dt.date(2012,12,1) + dt.timedelta(n) for n in np.arange(0,31,0.25)])
Ndays = (t[-1] - t[0]).days
data_per_day = np.array([sum(t == t[0] + dt.timedelta(d)) for d in xrange(Ndays)])
However I find this to be very slow! (More than 10 minutes for approximately 400,000 data points) Is there a faster way of doing this?
timedeltais slowing you down. Consider comparingdwithtLen = t-t[0], which you compute before instead? How big isNdayswhen you have 400k dates?Ndays?Ndaysis of order 2000. The solution by @root below speeded things up by several orders of magnitude.200000dates spanning2000days.)