Add a new record for each missing row in a DataFrame with TimeStamp without replacing the original records

Question

Be the next Pandas DataFrame:

|      date                           |     counter      |
|-------------------------------------|------------------|
|          2022-01-01 10:00:01        |        1         |
|          2022-01-01 10:00:04        |        1         |
|          2022-01-01 10:00:04        |        1         |
|          2022-02-01 11:30:06        |        1         |

First I am applying a change to group repeated records in the same row:

   df['date'] = pd.to_datetime(df['date'])
   df = df.groupby(df['date']).sum().reset_index()

|      date                           |     counter      |
|-------------------------------------|------------------|
|          2022-01-01 10:00:01        |        1         |
|          2022-01-01 10:00:04        |        2         |
|          2022-02-01 11:30:06        |        1         |

Now I need to create a new DataFrame in the interval 2022-01-01 10:00:01 and 2022-02-01 11:30:06, which contains the records from the above DataFrame. In addition, include records with counter value 0, starting from the initial row at a given time interval in seconds.

For example, for 60 seconds:

|      date                           |     counter      |
|-------------------------------------|------------------|
|          2022-01-01 10:00:01        |        1         |
|          2022-01-01 10:00:04        |        2         |
|          2022-01-01 10:01:01        |        0         |
|          2022-01-01 10:02:01        |        0         |
|          2022-01-01 10:03:01        |        0         |
...
|          2022-02-01 11:29:01        |        0         |
|          2022-02-01 11:30:01        |        0         |
|          2022-02-01 11:30:06        |        1         |

I had been advised to use this command, but it doesn't work, as it deletes the original records if they don't fall within the generated time interval.

 df = df.set_index('date').asfreq('1S', fill_value=0).reset_index()

I am grateful for your help.

I think what you're looking for is .reindex

Elias Mi
– Elias Mi

2022-05-13 14:54:38 +00:00
Commented May 13, 2022 at 14:54 — Elias Mi
– Elias Mi, Commented May 13, 2022 at 14:54

Andrej Kesely · Accepted Answer · 2022-05-13 21:09:49Z

2

Try:

# if necessary, convert it:
#df["date"] = pd.to_datetime(df["date"])

x = df.groupby("date").sum()

x = x.reindex(
    x.index.join(
        pd.DatetimeIndex(
            pd.date_range(x.index.min(), x.index.max(), freq="1min")
        ),
        how="outer",
    ),
    fill_value=0,
)

print(x)

Prints:

                     counter
2022-01-01 10:00:01        1
2022-01-01 10:00:04        2
2022-01-01 10:01:01        0
2022-01-01 10:02:01        0
2022-01-01 10:03:01        0

...

2022-02-01 11:29:01        0
2022-02-01 11:30:01        0
2022-02-01 11:30:06        1

answered May 13, 2022 at 21:09

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Add a new record for each missing row in a DataFrame with TimeStamp without replacing the original records

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related