1

I am trying to extract data by certain dates (for example 06/20/2021 - 06/30/2021). Right now it reads the CSV file, sorts the data by date, and finds any duplicates. The next step is to extract all data by a date timeframe and am wondering how I can do this. Any help is very appreciated :). This is what I have below:

import pandas as pd
from datetime import date, timedelta

#df = pd.read_excel(r"/Users/britevoxops2/Desktop/sample_date.xlsx") #reading Excel File

df = pd.read_csv(r"/Users/filename/Desktop/sample_date.csv")
print(df) #print original data
df.head()

Final_result = df.sort_values('Joining Date') #sorting date
print(Final_result)

duplicate = df[df['Name'].duplicated() == True] #finding duplicate name
print('Here are the Duplicates: \n',duplicate) 


 
3
  • 1
    Can you show in what format your 'Joining Date' is? You can convert it to the pandas datetime format ans extract your required dates using df_date = df[(df['Joining Date'] < '23-03-21') & (df['Joining Date'] > '03-03-21')] Commented Jul 7, 2021 at 19:17
  • You can accept my answer if it worked :) Commented Jul 8, 2021 at 7:16
  • @Dana7371, you can upvote and accept whatever answer looks fit for you better. Commented Jul 8, 2021 at 9:23

3 Answers 3

2

You can convert it to the pandas datetime format and extract your required dates using

df_date = df[(df['Joining Date'] < '23-03-21') & (df['Joining Date'] > '03-03-21')]

Sign up to request clarification or add additional context in comments.

Comments

0
df.loc['2021-06-20' : '2021-06-30']

1 Comment

You need to specify the column here for it to work.
0

Below should work for you.

Sample DataFrame:

>>> df
          Date
0   06/10/2021
1   06/11/2021
2   06/12/2021
3   06/13/2021
4   06/14/2021
5   06/15/2021
6   06/16/2021
7   06/17/2021
8   06/18/2021
9   06/19/2021
10  06/20/2021
11  06/21/2021
12  06/22/2021
13  06/23/2021
14  06/24/2021
15  06/25/2021
16  06/26/2021
17  06/27/2021
18  06/28/2021
19  06/29/2021
20  06/30/2021

Convert the Date column to datetime formate:

>>> df['Date'] = pd.to_datetime(df['Date'])
>>> df
         Date
0  2021-06-10
1  2021-06-11
2  2021-06-12
3  2021-06-13
4  2021-06-14
5  2021-06-15
6  2021-06-16
7  2021-06-17
8  2021-06-18
9  2021-06-19
10 2021-06-20
11 2021-06-21
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30

Now select the range between dates you want:

>>> df[(df['Date'] > '06/21/2021') & (df['Date'] <= '06/30/2021')]
         Date
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30

another way around to use a boolean mask, then use df.loc[mask]

>>> mask = (df['Date'] > '06/21/2021') & (df['Date'] <= '06/30/2021')

>>> print(df.loc[mask])
         Date
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30

Third Method:

Using pandas.Series.between

>>> df[df.Date.between("06/21/2021", "06/30/2021")]
         Date
11 2021-06-21
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30
# df[df['Date'].between("06/21/2021", "06/30/2021")]
# df.loc[df['Date'].between('06/21/2021','06/30/2021', inclusive=True)] <-- You can use `inclusive` with True or False.

Using df.query, You can refer to variables in the environment by prefixing them with an ‘@’ character, like used below.

>>> start_date, end_date = "06/21/2021", "06/30/2021"

>>> print(df.query('Date >= @start_date and Date <= @end_date'))
         Date
11 2021-06-21
12 2021-06-22
13 2021-06-23
14 2021-06-24
15 2021-06-25
16 2021-06-26
17 2021-06-27
18 2021-06-28
19 2021-06-29
20 2021-06-30

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.