0

Starting with a pandas Dataframe with datetimes as the index, I create 2 lists:

  • datetimes of all rows
  • dates of all rows

My code below works but is slow. Is there a faster way of doing this? I have tried multiple approaches but all are slow.

dateTimeFormat = '%Y-%m-%dT%H:%M:%S'
barDates = []
barDateTimes = []
for numpyDatetime64 in bars.index.values: # 'bars' is a pandas dataframe
    dateTime = datetime.datetime.strptime(str(numpyDatetime64).split('.')[0], dateTimeFormat)
    date = dateTime.date()
    barDateTimes.append(dateTime)
    barDates.append(date)

Per Andrej's suggestion, here is an example of the input dataframe:

                            A         B        C  ...         T          U         V
dateTime                                          ...                               
2010-05-13 09:31:00   117.130   117.240   117.13  ...  121.2400   121.2500  172429.0
2010-05-13 09:32:00   117.180   117.220   117.16  ...  121.1800   121.2700   98480.0
2010-05-13 09:33:00   117.200   117.280   117.19  ...  121.2701   121.3100   41255.0
2010-05-13 09:34:00   117.200   117.220   117.01  ...  121.2700   121.3999  250893.0
2010-05-13 09:35:00   117.100   117.130   116.83  ...  121.2500   121.2505   69952.0
...                       ...       ...      ...  ...       ...        ...       ...
2019-10-04 15:56:00   294.330   294.560   294.33  ...  141.7713   141.7800   15407.0
2019-10-04 15:57:00   294.550   294.630   294.50  ...  141.7750   141.7900   16815.0
2019-10-04 15:58:00   294.515   294.520   294.40  ...  141.7950   141.8700   39316.0
2019-10-04 15:59:00   294.485   294.530   294.38  ...  141.8600   141.8800   46623.0
2019-10-04 16:00:00   294.515   294.515   294.31  ...  141.8500   141.9300   89639.0
2
  • Can you edit your question and add some sample of the input dataframe? Maybe you can do the conversion inside the dataframe and then call .tolist() at the end. Commented Dec 28, 2019 at 20:22
  • @AndrejKesely. Yes, thanks, I added an example dataframe above. Commented Dec 28, 2019 at 20:39

2 Answers 2

1

The following code likely would work:

import pandas as pd
ix = pd.date_range(start="2010-05-13 09:31:00", end="2010-05-13 09:35:00", freq='min')
ix
DatetimeIndex(['2010-05-13 09:31:00', '2010-05-13 09:32:00',
               '2010-05-13 09:33:00', '2010-05-13 09:34:00',
               '2010-05-13 09:35:00'],
              dtype='datetime64[ns]', freq='T')
barDateTimes, barDates = ix.to_pydatetime().tolist(), ix.date.tolist()
barDateTimes, barDates
([datetime.datetime(2010, 5, 13, 9, 31),
  datetime.datetime(2010, 5, 13, 9, 32),
  datetime.datetime(2010, 5, 13, 9, 33),
  datetime.datetime(2010, 5, 13, 9, 34),
  datetime.datetime(2010, 5, 13, 9, 35)],
 [datetime.date(2010, 5, 13),
  datetime.date(2010, 5, 13),
  datetime.date(2010, 5, 13),
  datetime.date(2010, 5, 13),
  datetime.date(2010, 5, 13)])

You may want to also consider whether you actually need Python datetimes versus Pandas Timestamps.

Sign up to request clarification or add additional context in comments.

Comments

1

We can take advantage of .dt accessor functions which are quite handy to do date-time related tasks in pandas. Since the date variable is set as an index in your dataframe, we can either do reset_index on the dataframe or convert the index to a pd.Series as shown below:

import pandas as pd

dates = pd.Series(bars.index)
dates = dates.dt.strftime('%Y-%m-%dT%H:%M:%S')

barDates = dates.dt.date.tolist()
barDateTimes = dates.dt.strftime('%Y-%m-%dT%H:%M:%S').tolist()

Hope this gives you some idea.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.