1

I want my loop to go through 3 dfs with the same format, and do 3 things:

  1. Transform "Date" object column to datetime.
  2. Use loc to filter only the dates after year 2021.
  3. Set date as index.

Say I have 3 dataframes named df1, df2 and df3, they are all shaped like this:

Index Date Information
1 2020-01-01 Blablabla
2 2021-01-01 Blablabla
3 2022-01-01 Blablabla

After running my code, I want all dfs to be shaped like this:

Date Information
2021-01-01 Blablabla
2022-01-01 Blablabla

The code I'm running:

dfs = [df1, df2, df3]

for i in dfs:
    i['Date'] = pd.to_datetime(i['Date'])
    i = i.loc[i['Date'].dt.year >= 2021]
    i.set_index('Date', inplace=True)

I can't seem to get it to work. The first part of the loop is working, it transformms to datetime, but the filter is not working neither the indexing.

2 Answers 2

1

Let's this using a few tricks of pandas and python:

df1, df2, df3 = [i.assign(Date=pd.to_datetime(i['Date'])).set_index('Date').loc['2021':] 
                 for i in [df1, df2, df3]]
  1. Unpacking
  2. pd.Dataframe.assign
  3. With datetime in index you can use partial string indexing with loc
  4. List comprehension

Output:

print(f'{df1=}\n\n{df2=}\n\n{df3=}')

df1=            Index Information
Date                         
2021-01-01      2   Blablabla
2022-01-01      3   Blablabla

df2=            Index Information
Date                         
2021-01-01      2   Blablabla
2022-01-01      3   Blablabla

df3=            Index Information
Date                         
2021-01-01      2   Blablabla
2022-01-01      3   Blablabla

Fixing your for loop can be done like this:

dfs = [df1,df2,df3]
for i in dfs:
    i.assign(Date=pd.to_datetime(i['Date']), inplace=True)
    i.query('Date >= "2021-01-01"', inplace=True)
    i.set_index('Date', inplace=True)

df1
Sign up to request clarification or add additional context in comments.

Comments

0

step1: transform date column to 'real date'

d1['date'] = pd.to_datetime(d1['date'])
d2['date'] = pd.to_datetime(d2['date'])
d3['date'] = pd.to_datetime(d3['date'])

step2: select >= 2021 (year) rows

d1_s = d1.loc[d1['date'].dt.year >= 2021,:]
d2_s = d2.loc[d2['date'].dt.year >= 2021,:]
d3_s = d2.loc[d2['date'].dt.year >= 2021,:]

step3: pd.concat(d1_s,d2_s,d3_s) named a new dataframe 'data_all'

data_all = pd.concat([d1_s,d2_s,d3_s],ignore_index=True)

step4:set 'date' column as index

data_all.set_index(['date'], inplace=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.