1

Be the next Pandas DataFrame:

|      date                           |     counter      |
|-------------------------------------|------------------|
|          2022-01-01 10:00:01        |        1         |
|          2022-01-01 10:00:04        |        1         |
|          2022-01-01 10:00:04        |        1         |
|          2022-02-01 11:30:06        |        1         |

First I am applying a change to group repeated records in the same row:

   df['date'] = pd.to_datetime(df['date'])
   df = df.groupby(df['date']).sum().reset_index()
|      date                           |     counter      |
|-------------------------------------|------------------|
|          2022-01-01 10:00:01        |        1         |
|          2022-01-01 10:00:04        |        2         |
|          2022-02-01 11:30:06        |        1         |

Now I need to create a new DataFrame in the interval 2022-01-01 10:00:01 and 2022-02-01 11:30:06, which contains the records from the above DataFrame. In addition, include records with counter value 0, starting from the initial row at a given time interval in seconds.

For example, for 60 seconds:

|      date                           |     counter      |
|-------------------------------------|------------------|
|          2022-01-01 10:00:01        |        1         |
|          2022-01-01 10:00:04        |        2         |
|          2022-01-01 10:01:01        |        0         |
|          2022-01-01 10:02:01        |        0         |
|          2022-01-01 10:03:01        |        0         |
...
|          2022-02-01 11:29:01        |        0         |
|          2022-02-01 11:30:01        |        0         |
|          2022-02-01 11:30:06        |        1         |

I had been advised to use this command, but it doesn't work, as it deletes the original records if they don't fall within the generated time interval.

 df = df.set_index('date').asfreq('1S', fill_value=0).reset_index()

I am grateful for your help.

1
  • I think what you're looking for is .reindex Commented May 13, 2022 at 14:54

1 Answer 1

2

Try:

# if necessary, convert it:
#df["date"] = pd.to_datetime(df["date"])

x = df.groupby("date").sum()

x = x.reindex(
    x.index.join(
        pd.DatetimeIndex(
            pd.date_range(x.index.min(), x.index.max(), freq="1min")
        ),
        how="outer",
    ),
    fill_value=0,
)

print(x)

Prints:

                     counter
2022-01-01 10:00:01        1
2022-01-01 10:00:04        2
2022-01-01 10:01:01        0
2022-01-01 10:02:01        0
2022-01-01 10:03:01        0

...

2022-02-01 11:29:01        0
2022-02-01 11:30:01        0
2022-02-01 11:30:06        1
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.