2
dft = pd.DataFrame(randn(100000,1), columns=['A'],
     index=pd.date_range('20130101',periods=100000,freq='T'))

As you can see, I initialize a Dateframe from '2013-01-01' to ‘2013-03-11’ with 10 min interval. How can I select specific data from specific conditions?

1) Date in a list eg: If there is a list ['2013-01-02', '2013-01-04', '2013-01-06'] How can I select the data on this list date? Or how can I select the data that are not on this list date? More specifically, '2013-01-02' means all the data from '2013-01-02 00:00:00' to '2013-01-02 23:50:00'.

2) multiple slices choose eg: I wish I can select the data in multiple slices like the following: ['2013-01-02':'2013-01-03'] & ['2013-01-05':'2013-01-07'] & ['2013-01-09':'2013-01-11'] More specifically, this slice should be same as python slice which means including left but not including right.

1 Answer 1

3

Assuming this is the raw data (with a Datetime index)

dft = pd.DataFrame(np.random.randn(100000,1), columns=['A'],
     index=pd.date_range('20130101',periods=100000,freq='T'))
dft.head()
                            A
2013-01-01 00:00:00  0.313644
2013-01-01 00:01:00  0.458860
2013-01-01 00:02:00  0.841434
2013-01-01 00:03:00 -0.135846
2013-01-01 00:04:00 -0.881316

For 1), just use .isin()

myDates = ['2013-01-02', '2013-01-04', '2013-01-06']

# to get data in list
df_in = dft[pd.to_datetime(dft.index.date).isin(myDates)]
df_in.head()
                            A
2013-01-02 00:00:00  0.444005
2013-01-02 00:01:00 -0.073561
2013-01-02 00:02:00  0.256737
2013-01-02 00:03:00  1.304807
2013-01-02 00:04:00 -0.741956

# to get data not in list
df_not_in = dft[~pd.to_datetime(dft.index.date).isin(myDates)]
df_not_in_list.head()
                            A
2013-01-01 00:00:00 -0.944070
2013-01-01 00:01:00  0.225456
2013-01-01 00:02:00  0.571424
2013-01-01 00:03:00 -0.004389
2013-01-01 00:04:00  0.933229

For 2), if I understand correctly, you want to select the data using multiple datetime slices. To do this, you can use multiple index masks, from a nested list, to filter by date

myDates = [['2013-01-02','2013-01-03'],
           ['2013-01-05','2013-01-07'],
           ['2013-01-09','2013-01-11']]
df_masked = dft[
      (dft.index >= myDates[0][0]) & (dft.index <= myDates[0][1]) & \
      (dft.index >= myDates[1][0]) & (dft.index <= myDates[1][1]) & \
      (dft.index >= myDates[2][0]) & (dft.index <= myDates[2][1])
      ]
Sign up to request clarification or add additional context in comments.

6 Comments

Hi WR, I have tested this method before posting the question. It didn't work.
That is strange. I have used the same sample data you posted and it worked. Your sample data has a Datetime index. Check that your real data also has a Datetime index in the same format as the sample data. That could be part of the problem....
Also, if you have made an attempt, you should post your work and show the problem/error message.
For 1) it only returns 2013-01-02 00:00:00, 2013-01-04 00:00:00, 2013-01-06 00:00:00 data. Sorry I didn't make it clear. I wish I can get all the data in 2013-01-02, 2013-01-04 and 2013-01-06.
For 2), this solution is what I want. Thanks.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.