0

I have a DataFrame that contains the price of a ticker like this:

date                    price
2006-01-02 09:00:00     6548.34
2006-01-02 09:01:00     6457.61
2006-01-02 09:02:00     6452.82
2006-01-02 09:03:00     6452.39
2006-01-02 09:04:00     6451.61
                         ...
2021-01-29 13:29:40    15229.54
2021-01-29 13:29:45    15229.54
2021-01-29 13:29:50    15229.54
2021-01-29 13:29:55    15229.54
2021-01-29 13:30:00    15138.31

How can I remove the hour, minute, and second parts in the date?

I mean I want something like this.

date          price
2006-01-02    6577.33
2006-01-03    6488.62
2006-01-04    6469.82
2006-01-05    6555.39
2006-01-06    6448.65
                         ...
2021-01-25    15229.53
2021-01-26    15129.34
2021-01-27    14129.64
2021-01-28    15228.34
2021-01-29    15138.31

I only want the last price of each day.

So the thing I want to do is to remove all the other prices of each day.

How can I achieve this?

My df.index returns the following

>>> df.index
DatetimeIndex(['2006-01-02 09:00:00', '2006-01-02 09:01:00',
               '2006-01-02 09:02:00', '2006-01-02 09:03:00',
               '2006-01-02 09:04:00', '2006-01-02 09:05:00',
               '2006-01-02 09:06:00', '2006-01-02 09:07:00',
               '2006-01-02 09:08:00', '2006-01-02 09:09:00',
               ...
               '2021-01-29 13:29:15', '2021-01-29 13:29:20',
               '2021-01-29 13:29:25', '2021-01-29 13:29:30',
               '2021-01-29 13:29:35', '2021-01-29 13:29:40',
               '2021-01-29 13:29:45', '2021-01-29 13:29:50',
               '2021-01-29 13:29:55', '2021-01-29 13:30:00'],
              dtype='datetime64[ns]', name='date', length=6345485, freq=None)

df.columns returns

>>> df.columns
Index(['price'], dtype='object', name='stock_id')

My pd.to_datetime(df.index, format='%Y%m%d') returns

>>> pd.to_datetime(df.index)
DatetimeIndex(['2006-01-02 09:00:00', '2006-01-02 09:01:00',
               '2006-01-02 09:02:00', '2006-01-02 09:03:00',
               '2006-01-02 09:04:00', '2006-01-02 09:05:00',
               '2006-01-02 09:06:00', '2006-01-02 09:07:00',
               '2006-01-02 09:08:00', '2006-01-02 09:09:00',
               ...
               '2021-01-29 13:29:15', '2021-01-29 13:29:20',
               '2021-01-29 13:29:25', '2021-01-29 13:29:30',
               '2021-01-29 13:29:35', '2021-01-29 13:29:40',
               '2021-01-29 13:29:45', '2021-01-29 13:29:50',
               '2021-01-29 13:29:55', '2021-01-29 13:30:00'],
              dtype='datetime64[ns]', name='date', length=6345485, freq=None)

This is so strange. Because the to_datetime() conversion doesn't seem to work.

2 Answers 2

1

You can keep only the date of the datetime, and then group by the date and keeping the last value of each:

df["date"] = pd.to_datetime(df["date"]).dt.normalize()
df = df.groupby("date").last("price")

I added pd.to_datetime() if your date is not in the datetime format.

Sign up to request clarification or add additional context in comments.

5 Comments

Don't use dt.date in Pandas dataframe. They are treated as object dtype and do not play well with many other Pandas' functions.
Thank you for pointing out. I changed dt.date to dt.nomalize().
@XavierBrt. Thank you for replying. But the format of my DataFrame seems to be different than yours. Do you know how to convert in my case?
I've added my DataFrame format in my post.
Ok, your date already had the datetime format so pd.to_datetime() was not necessary, as you found it.
0

Thanks to XavierBrt's answer, I've figured out the conversion for the format of my DataFrame.

df.index = df.index.normalize()
df = df.groupby(df.index).last('price')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.