0

I have been trying to get this to work and cannot find a solution. I have data that looks like this in dataframe (df):

index   plant_name  business_name           power_kwh   mos_time        day month   year
0       PROVIDENCE HEIGHTS  UNITED STATES   7805.7  2023-02-25 08:00:00 56  2   2023
1       PROVIDENCE HEIGHTS  UNITED STATES   9943.7  2023-02-25 07:00:00 56  2   2023
2       PROVIDENCE HEIGHTS  UNITED STATES   9509.8  2023-02-25 06:00:00 56  2   2023
3       PROVIDENCE HEIGHTS  UNITED STATES   8333    2023-02-25 05:00:00 56  2   2023
2993    PROVIDENCE HEIGHTS  UNITED STATES   14560   2022-10-25 17:00:00 298 10  2022
2994    PROVIDENCE HEIGHTS  UNITED STATES   9260.4  2022-10-25 16:00:00 298 10  2022
2995    PROVIDENCE HEIGHTS  UNITED STATES   7327.7  2022-10-25 15:00:00 298 10  2022
2996    PROVIDENCE HEIGHTS  UNITED STATES   5579.1  2022-10-25 14:00:00 298 10  2022
2997    PROVIDENCE HEIGHTS  UNITED STATES   4507    2022-10-25 13:00:00 298 10  2022
13993   PROVIDENCE HEIGHTS  UNITED STATES   1655.3  2021-07-19 14:00:00 200 7   2021
13994   PROVIDENCE HEIGHTS  UNITED STATES   1686.1  2021-07-19 13:00:00 200 7   2021
13995   PROVIDENCE HEIGHTS  UNITED STATES   2243.7  2021-07-19 12:00:00 200 7   2021
13996   PROVIDENCE HEIGHTS  UNITED STATES   3577.9  2021-07-19 11:00:00 200 7   2021
33995   PROVIDENCE HEIGHTS  UNITED STATES   2220.2  2019-04-05 20:00:00 95  4   2019
33996   PROVIDENCE HEIGHTS  UNITED STATES   2266.7  2019-04-05 19:00:00 95  4   2019
33997   PROVIDENCE HEIGHTS  UNITED STATES   2292.4  2019-04-05 18:00:00 95  4   2019
33998   PROVIDENCE HEIGHTS  UNITED STATES   2197    2019-04-05 17:00:00 95  4   2019

The dict that I need to use to assign week numbers looks like this:

weeks = {
    1: [1, 2, 3, 4, 5],
    2: [6, 7, 8, 9],
    3: [10, 11, 12, 13],
    4: [14, 15, 16, 17],
    5: [18, 19, 20, 21, 22],
    6: [23, 24, 25, 26],
    7: [27, 28, 29, 30, 31],
    8: [32, 33, 34, 35],
    9: [36, 37, 38, 39],
    10: [40, 41, 42, 43, 44],
    11: [45, 46, 47, 48],
    12: [49, 50, 51, 52]
}

And, my answer looks like this with the added "week" column on the far right column. thank you for any help here.

index   plant_name  business_name           power_kwh   mos_time    day month   year    week
0       PROVIDENCE HEIGHTS  UNITED STATES   7805.7  2023-02-25 08:00:00 56  2   2023    8
1       PROVIDENCE HEIGHTS  UNITED STATES   9943.7  2023-02-25 07:00:00 56  2   2023    8
2       PROVIDENCE HEIGHTS  UNITED STATES   9509.8  2023-02-25 06:00:00 56  2   2023    8
3       PROVIDENCE HEIGHTS  UNITED STATES   8333    2023-02-25 05:00:00 56  2   2023    8
2993    PROVIDENCE HEIGHTS  UNITED STATES   14560   2022-10-25 17:00:00 298 10  2022    43
2994    PROVIDENCE HEIGHTS  UNITED STATES   9260.4  2022-10-25 16:00:00 298 10  2022    43
2995    PROVIDENCE HEIGHTS  UNITED STATES   7327.7  2022-10-25 15:00:00 298 10  2022    43
2996    PROVIDENCE HEIGHTS  UNITED STATES   5579.1  2022-10-25 14:00:00 298 10  2022    43
2997    PROVIDENCE HEIGHTS  UNITED STATES   4507    2022-10-25 13:00:00 298 10  2022    43
13993   PROVIDENCE HEIGHTS  UNITED STATES   1655.3  2021-07-19 14:00:00 200 7   2021    30
13994   PROVIDENCE HEIGHTS  UNITED STATES   1686.1  2021-07-19 13:00:00 200 7   2021    30
13995   PROVIDENCE HEIGHTS  UNITED STATES   2243.7  2021-07-19 12:00:00 200 7   2021    30
13996   PROVIDENCE HEIGHTS  UNITED STATES   3577.9  2021-07-19 11:00:00 200 7   2021    30
33995   PROVIDENCE HEIGHTS  UNITED STATES   2220.2  2019-04-05 20:00:00 95  4   2019    14
33996   PROVIDENCE HEIGHTS  UNITED STATES   2266.7  2019-04-05 19:00:00 95  4   2019    14
33997   PROVIDENCE HEIGHTS  UNITED STATES   2292.4  2019-04-05 18:00:00 95  4   2019    14
33998   PROVIDENCE HEIGHTS  UNITED STATES   2197    2019-04-05 17:00:00 95  4   2019    14

I have tried things like the function below but get the ValueError: Week number not found in dictionary:

weeks = {
    1: [1, 2, 3, 4, 5],
    2: [6, 7, 8, 9],
    3: [10, 11, 12, 13],
    4: [14, 15, 16, 17],
    5: [18, 19, 20, 21, 22],
    6: [23, 24, 25, 26],
    7: [27, 28, 29, 30, 31],
    8: [32, 33, 34, 35],
    9: [36, 37, 38, 39],
    10: [40, 41, 42, 43, 44],
    11: [45, 46, 47, 48],
    12: [49, 50, 51, 52]
}
def get_week_number(day, month):
    for num, days in weeks.items():
        if day in days and num <= 12:
            if month == num:
                return num
    raise ValueError("Week number not found in dictionary.")

# add a column for week number
df['week'] = ncData.apply(lambda row: get_week_number(row['day'], row['month']), axis=1)

1 Answer 1

0

Why not simply use isocalendar().week ?

df["mos_time"] = pd.to_datetime(df["mos_time"])

df["week"] = df["mos_time"].dt.isocalendar().week

Output :

print(df)

               plant_name  business_name  power_kwh            mos_time  day  month  year  week
0      PROVIDENCE HEIGHTS  UNITED STATES     7805.7 2023-02-25 08:00:00   56      2  2023     8
1      PROVIDENCE HEIGHTS  UNITED STATES     9943.7 2023-02-25 07:00:00   56      2  2023     8
2      PROVIDENCE HEIGHTS  UNITED STATES     9509.8 2023-02-25 06:00:00   56      2  2023     8
3      PROVIDENCE HEIGHTS  UNITED STATES     8333.0 2023-02-25 05:00:00   56      2  2023     8
2993   PROVIDENCE HEIGHTS  UNITED STATES    14560.0 2022-10-25 17:00:00  298     10  2022    43
2994   PROVIDENCE HEIGHTS  UNITED STATES     9260.4 2022-10-25 16:00:00  298     10  2022    43
2995   PROVIDENCE HEIGHTS  UNITED STATES     7327.7 2022-10-25 15:00:00  298     10  2022    43
2996   PROVIDENCE HEIGHTS  UNITED STATES     5579.1 2022-10-25 14:00:00  298     10  2022    43
2997   PROVIDENCE HEIGHTS  UNITED STATES     4507.0 2022-10-25 13:00:00  298     10  2022    43
13993  PROVIDENCE HEIGHTS  UNITED STATES     1655.3 2021-07-19 14:00:00  200      7  2021    29
13994  PROVIDENCE HEIGHTS  UNITED STATES     1686.1 2021-07-19 13:00:00  200      7  2021    29
13995  PROVIDENCE HEIGHTS  UNITED STATES     2243.7 2021-07-19 12:00:00  200      7  2021    29
13996  PROVIDENCE HEIGHTS  UNITED STATES     3577.9 2021-07-19 11:00:00  200      7  2021    29
33995  PROVIDENCE HEIGHTS  UNITED STATES     2220.2 2019-04-05 20:00:00   95      4  2019    14
33996  PROVIDENCE HEIGHTS  UNITED STATES     2266.7 2019-04-05 19:00:00   95      4  2019    14
33997  PROVIDENCE HEIGHTS  UNITED STATES     2292.4 2019-04-05 18:00:00   95      4  2019    14
33998  PROVIDENCE HEIGHTS  UNITED STATES     2197.0 2019-04-05 17:00:00   95      4  2019    14
Sign up to request clarification or add additional context in comments.

3 Comments

thank you for this. however, i need to use the dictionary that I've included to define the weeks assigned to each month.
It makes no sense to make a dictionnary of months/weeks in your case. Can you explain why ?
i have a large amount of data that spans many years including leap years and I need the week numbers 1-52 to be consistent for each year. Jan, Apr, Jul, Oct are typically assigned 5 weeks and the other months 4 weeks as the dict above suggests.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.