1

I got this code below which works fine but takes too much time to execute (>30s):

df['Date_engagement'] = np.nan
for i in range(df.shape[0]):
if not(pd.isna(df.loc[i,'Engagement'])):
    df.loc[i, 'Date_engagement'] = np.busday_offset(
        df.loc[i, 'Date_réception'].date(),
        df.loc[i,'Engagement'], roll='backward', busdaycal=france_bdc
    )

I guess this is because I loop through the Pandas dataframe, and I wonder if there is a way to achieve the same result, not using any loop.

Basically what this code does is creating the 'Date_engagement' column which will be the result of adding 'Engagement' days to 'Date_réception' if *'Engagement' *is not empty. I use *'np.busday_offset()' *to avoid holidays.

Thanks in advance for your help...

I tried to use Pandas features for dataframes (like DataFrame.apply()), but can't find how to do it correctly to get the same result as explained.

2
  • 2
    "using for loop with Pandas dataframe" is an immediate red flag. A major advantage of the dataframe is to vectorise Commented May 12, 2023 at 9:12
  • 1
    if you post what you are trying to achieve, than it will be more than likely that we (SO contributors) can help you increase speed > 10x... stackoverflow.com/help/minimal-reproducible-example Commented May 12, 2023 at 9:14

3 Answers 3

3

Suppose the input dataframe:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Date_réception': ['2023-05-12', '2023-05-13'],
                   'Engagement': [10, np.nan]})
print(df)

# Output
  Date_réception  Engagement
0     2023-05-12        10.0
1     2023-05-13         NaN

You can vectorize your code:

france_bdc = np.busdaycalendar(weekmask='1111100', holidays=['2023-05-18'])
m = df['Engagement'].notna()

df.loc[m, 'Date_engagement'] = (
    np.busday_offset(df.loc[m, 'Date_réception'].values.astype('datetime64[D]'),
                     df.loc[m, 'Engagement'], roll='backward', busdaycal=france_bdc)
)

Output:

>>> df
  Date_réception  Engagement Date_engagement
0     2023-05-12        10.0      2023-05-29
1     2023-05-13         NaN             NaT
Sign up to request clarification or add additional context in comments.

1 Comment

This works just great, in much much less time! (<0.2s) Thanks so much!
0

Maybe this will work:

mask = df['Engagement'].notnull()

df.loc[mask, 'Date_engagement'] = np.busday_offset(
    df.loc[mask, 'Date_réception'].dt.date,
    df.loc[mask, 'Engagement'],
    roll='backward',
    busdaycal=france_bdc
)

1 Comment

Here 'dt.date' issues a TypeError: TypeError: Cannot cast array data from dtype('O') to dtype('<M8[D]') according to the rule 'safe'. Using values.astype('datetime64[D]') instead works fine. Thank you so much!
0

Not the most efficient one but you can still try it :

dt_arr = np.busday_offset(
            df["Date_réception"].to_numpy().astype("datetime64[D]"),
            df["Engagement"].fillna(0), roll="backward", busdaycal=france_bdc)

df["Date_engagement"] = pd.Series(dt_arr).where(df["Engagement"].notna())

#620 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Example :

np.random.seed(1)

df = pd.DataFrame({
    "Date_réception": pd.date_range("20230401", periods=30, freq="W"),
    "Engagement": np.random.choice(np.append(np.random.randint(5, 26, 25), [np.nan]),
                                   size=30, replace=True)}
)

france_bdc = np.busdaycalendar(
                holidays=["2023-01-01", "2023-04-17", "2023-05-01",
                          "2023-05-08", "2023-07-14", "2023-08-15",
                          "2023-11-01", "2023-11-11", "2023-12-25"]
)

Output :

print(df)

   Date_réception  Engagement Date_engagement
0      2023-04-02        23.0      2023-05-05
1      2023-04-09        23.0      2023-05-15
2      2023-04-16        21.0      2023-05-18
3      2023-04-23        10.0      2023-05-09
4      2023-04-30        23.0      2023-06-02
5      2023-05-07        10.0      2023-05-22
6      2023-05-14        19.0      2023-06-08
7      2023-05-21        18.0      2023-06-14
8      2023-05-28        21.0      2023-06-26
9      2023-06-04        21.0      2023-07-03
10     2023-06-11        20.0      2023-07-07
11     2023-06-18        19.0      2023-07-13
12     2023-06-25         NaN             NaT
13     2023-07-02        16.0      2023-07-25
14     2023-07-09        10.0      2023-07-24
15     2023-07-16        10.0      2023-07-28
16     2023-07-23         5.0      2023-07-28
17     2023-07-30         9.0      2023-08-10
18     2023-08-06        18.0      2023-08-31
19     2023-08-13        25.0      2023-09-18
20     2023-08-20        23.0      2023-09-20
21     2023-08-27         6.0      2023-09-04
22     2023-09-03         NaN             NaT
23     2023-09-10         5.0      2023-09-15
24     2023-09-17        20.0      2023-10-13
25     2023-09-24        13.0      2023-10-11
26     2023-10-01        10.0      2023-10-13
27     2023-10-08        15.0      2023-10-27
28     2023-10-15        10.0      2023-10-27
29     2023-10-22         NaN             NaT

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.