0

I'm working on dataset that has a column that tells you how many days it took for a building permit to be approved (time_range column gives you this info). I'm trying to create another column (time_frame) that will break those approval times into categories like 1-29 days, 30 - 59 days, etc. The dataset also has permits that were denied and I already filled some of the time_frame column with denied. For the rest of the entries, I'm trying to fill it in with a category that I've created. When I run the cell in jupyter notebook, it is stuck running and hasn't outputted anything. How should I rewrite my code to use less if-else statements and possibly remove the for-loop?

Here is my code:

for i in range(0,len(df['time_range'])):

    if df.loc[i,'time_frame'] != 'denied':

    if df.loc[i,'time_range'] == 0.0:
        df.loc[i,'time_frame'] = 'instant approval'

    elif (df.loc[i,'time_range'] >= 1.0 and df.loc[i,'time_range'] <= 29.0):
        df.loc[i,'time_frame'] = '1 - 29 days'

    elif (df.loc[i,'time_range'] >= 30.0 and df.loc[i,'time_range'] <= 59.0):
        df.loc[i,'time_frame'] = '30 - 59 days'

    elif (df.loc[i,'time_range'] >= 60.0 and df.loc[i,'time_range'] <= 89.0):
        df.loc[i,'time_frame'] = '60 - 89 days'

    elif (df.loc[i,'time_range'] >= 90.0 and df.loc[i,'time_range'] <= 119.0):
        df.loc[i,'time_frame'] = '90 - 119 days'

    elif (df.loc[i,'time_range'] >= 120.0 and df.loc[i,'time_range'] <= 149.0):
        df.loc[i,'time_frame'] = '120 - 150 days'

    elif (df.loc[i,'time_range'] >= 150.0 and df.loc[i,'time_range'] <= 179.0):
        df.loc[i,'time_frame'] = '150 - 179 days'

    else:
        df.loc[i,'time_frame'] = '180+ days'

1 Answer 1

2

Setup

df = pd.DataFrame({
        'time_frame': {0: nan, 1: nan, 2: nan, 3: 'denied', 4: nan, 5: nan, 6: nan},
        'time_range': {0: 0, 1: 10, 2: 120, 3: 10, 4: 50, 5: 175, 6: 250}})

df
   time_range time_frame
0           0        NaN
1          10        NaN
2         120        NaN
3          10     denied
4          50        NaN
5         175        NaN
6         250        NaN

Use pd.cut and mask those rows where "time_frame" is "denied":

bins = [-np.inf, 0, 29, 59, 89, 119, 149, 179, np.inf]
labels = [
    'instant', '1-29 days', '30-59 days', '60-89 days', 
    '90-119 days', '120-149 days', '150-179 days', '180+ days']

df['time_frame'] = (
    pd.cut(df['time_range'], bins=bins, labels=labels, right=True)
      .where(df['time_frame'].ne('denied'), 'denied'))

print(df)
   time_range    time_frame
0           0       instant
1          10     1-29 days
2         120  120-149 days
3          10        denied
4          50    30-59 days
5         175  150-179 days
6         250     180+ days
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.