0

I have a dataframe with a column of datetime type that I'd like to create a 2nd categorical column based upon. The second column should consist of string identifiers using the following logic:

ON: [16:00-09:30)
FH: [09:30-10:00)
M: [10:00-15:00)
SLH: [15:00-15:30)
LH: [15:30-16:00) 

For instance, for dataframe d:

date_rng = pd.date_range(start='2024-01-01', periods=48, freq='30min')  
d = pd.DataFrame({'datetime': date_rng})
    

I'd like the output below:

d.head(40)
Out[186]: 
              datetime part
0  2024-01-01 00:00:00   ON
1  2024-01-01 00:30:00   ON
2  2024-01-01 01:00:00   ON
3  2024-01-01 01:30:00   ON
4  2024-01-01 02:00:00   ON
5  2024-01-01 02:30:00   ON
6  2024-01-01 03:00:00   ON
7  2024-01-01 03:30:00   ON
8  2024-01-01 04:00:00   ON
9  2024-01-01 04:30:00   ON
10 2024-01-01 05:00:00   ON
11 2024-01-01 05:30:00   ON
12 2024-01-01 06:00:00   ON
13 2024-01-01 06:30:00   ON
14 2024-01-01 07:00:00   ON
15 2024-01-01 07:30:00   ON
16 2024-01-01 08:00:00   ON
17 2024-01-01 08:30:00   ON
18 2024-01-01 09:00:00   FH
19 2024-01-01 09:30:00   FH
20 2024-01-01 10:00:00    M
21 2024-01-01 10:30:00    M
22 2024-01-01 11:00:00    M
23 2024-01-01 11:30:00    M
24 2024-01-01 12:00:00    M
25 2024-01-01 12:30:00    M
26 2024-01-01 13:00:00    M
27 2024-01-01 13:30:00    M
28 2024-01-01 14:00:00    M
29 2024-01-01 14:30:00    M
30 2024-01-01 15:00:00  SLH
31 2024-01-01 15:30:00   LH
32 2024-01-01 16:00:00   ON
33 2024-01-01 16:30:00   ON
34 2024-01-01 17:00:00   ON
35 2024-01-01 17:30:00   ON
36 2024-01-01 18:00:00   ON
37 2024-01-01 18:30:00   ON
38 2024-01-01 19:00:00   ON
39 2024-01-01 19:30:00   ON
3
  • 1
    you can create function which gets one value and convert it to one of strings ON,FH, etc. and later you could use df['part'] = df['datetime'].apply(your_function) Commented Feb 22 at 1:27
  • 2
    Incidentally, for 2024-01-01 09:00:00 the above incorrectly has 'FH'. Should be 'ON'. Commented Feb 22 at 10:54
  • @ouroboros1, you're right, mistyped :) Commented Feb 24 at 17:53

2 Answers 2

3

Option 1: pd.cut

base_time = d['datetime'].dt.normalize() + pd.Timedelta(hours=16)

offset = (d['datetime'] - base_time).dt.total_seconds().mod(86400) # 24 * 3600

bins = [hours * 3600 for hours in [0, 17.5, 18, 23, 23.5, 24]]

labels = ['ON', 'FH', 'M', 'SLH', 'LH']

d['part'] = pd.cut(offset, bins=bins, labels=labels, right=False)

Output:

# showing first row per continuous value
d[d['part'].ne(d['part'].shift())]

              datetime part
0  2024-01-01 00:00:00   ON
19 2024-01-01 09:30:00   FH
20 2024-01-01 10:00:00    M
30 2024-01-01 15:00:00  SLH
31 2024-01-01 15:30:00   LH
32 2024-01-01 16:00:00   ON

with:

d['part'].dtype

CategoricalDtype(categories=['ON', 'FH', 'M', 'SLH', 'LH'], ordered=True, 
                 categories_dtype=object)

Explanation

  • Use Series.dt.normalize + offset 16 hours with pd.Timedelta to set a base_time at 16:00.
  • Subtract from d['datetime'] and get Series.dt.total_seconds + Series.mod with 24 * 3600 (day in seconds).
  • Create bins with endpoints set against 16:00 as 0. E.g., 17.5 = 8 hours until end day + 9.5 hours next day (09:30).
  • Set labels and apply pd.cut with right=False.

Option 2: np.select

import numpy as np

delta = (d['datetime'] - d['datetime'].dt.normalize()).dt.total_seconds()

condlist = [
    (delta < hours * 3600) for hours in [9.5, 10, 15, 15.5, 16]
    ]

d['part2'] = np.select(condlist=condlist, choicelist=labels, 
                       default='ON')

d['part2'] = pd.Categorical(d['part2'], categories=labels, ordered=True)

Equality check:

d['part'].equals(d['part2'])
# True

Explanation

  • Get timedelta in seconds with Series.dt.normalize + df.total_seconds.
  • Set up condlist ("bins") with delta smaller than 09:30 (9.5), 10, etc.
  • Pass condlist to np.select and use default='ON' to get >= 16:00 as 'ON' as well.
  • Finally, apply pd.Categorical with categories=labels and ordered=True to get: ['ON' < 'FH' < 'M' < 'SLH' < 'LH'].
Sign up to request clarification or add additional context in comments.

1 Comment

Could have been more precise, used 'categorical' as a descriptor rather than a pandas datatype. What I'd landed on is a version your 2nd answer (but going to borrow the creation of condlist via list comprehension to tighten it up a bit). Thanks bud!
1

You can create function which gets one value and convert it to one of strings ON,FH, etc.
and later you could use

df['part'] = df['datetime'].apply(your_function)

Minimal working code.

I get time converted to string and compare with string
but you can also compare directly objects datetime.time.

You may also rewrite it without some >=

import pandas as pd

date_rng = pd.date_range(start='2024-01-01', periods=48, freq='30min')  
df = pd.DataFrame({'datetime': date_rng})

def my_function(item):
    text = str(item.time()) 
    #print(text)
    if text >= '16:00:00' or text < '09:30:00':  # here is `or` instead of `and` because value has to be in one of ranges [16:00-23:59] or [00:00-09:30)
        return "ON"
    elif text >= '09:30:00' and text < '10:00:00':
        return "FH"
    elif text >= '10:00:00' and text < '15:00:00':
        return "M"
    elif text >= '15:00:00' and text < '15:30:00':
        return "SLH"
    elif text >= '15:30:00' and text < '16:00:00':
        return "LH"
    return "???"

df['part'] = df['datetime'].apply(my_function)

print(df)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.