1

I am trying to write a nested if/else statement using pandas, but not very great with if statements in pandas. Please find the sample CSV data being processed and the sample code snippet I've written so far.

df:

t1  
8
1134
0
119
122
446
21
0
138 
0

Current if/else statement logic:

import pandas as pd

df = pd.read_csv('file.csv', sep=';')

def get_cost(df):
    t_zone = 720
    max_rate = 5.5
    rate = 0.0208
    duration = df['t1']

    if duration < t_zone:
        if(duration * rate) >= max_rate:
            return max_rate
        else:
            return(duration * rate)
    else:
        if duration >= 720:
            x = int(duration/720)
            y = ((duration%720) * rate)
            if y >= max_rate:
                return((x * max_rate) + max_rate)
            else:
                return((x * max_rate) + y)

cost = get_cost(df)

This snippet raises a ValueError: The truth value of a Series is ambiguous error. If anyone has better solutions or could help translate this if/else statement a more pandas way that would be amazing!

3
  • add print(duration) and I'm sure you will figure this out Commented Aug 21, 2018 at 14:46
  • Hi @TomWojcik are you referring to find where the error is coming from? Commented Aug 21, 2018 at 14:55
  • Yes. And a full stack trace would help. Commented Aug 21, 2018 at 14:57

3 Answers 3

4

It is not efficient to use loops and if statements in pandas, unless absolutely necessary. Here is a completely vectorized, 100% pandas solution:

import numpy as np # Needs numpy, too
x = df['t1'] // 720 * max_rate # Note the use of //!
y = df['t1'] %  720 * rate
df['cost'] = np.where(df['t1'] < t_zone, 
                      np.minimum(df['t1'] * rate, max_rate),
                      np.minimum(y,               max_rate) + x)
Sign up to request clarification or add additional context in comments.

Comments

2

Try this solution.

import pandas as pd

df = pd.read_csv('file.csv')

def get_cost(x):
    t_zone = 720
    max_rate = 5.5
    rate = 0.0208
    duration = x['t1']
    if duration < t_zone:
        if(duration * rate) >= max_rate:
            return max_rate
        else:
            return(duration * rate)
    else:
        if duration >= 720:
            x = int(duration/720)
            y = ((duration%720) * rate)
            if y >= max_rate:
                return((x * max_rate) + max_rate)
            else:
                return((x * max_rate) + y)

df['cost'] = df.apply(get_cost, axis=1)

You could also assign result to the same column too. In this case, I have assigned to a custom column called 'cost'.

Output:

    t1  cost
0   8   0.1664
1   1134    11.0000
2   0   0.0000
3   119 2.4752
4   122 2.5376
5   446 5.5000
6   21  0.4368
7   0   0.0000
8   138 2.8704
9   0   0.0000

2 Comments

thank you so much for you help. This was exactly what i needed!
Kindly accept the answer in case this was what you were looking for. :)
1

You should be iterating over the duration rather than directly comparing it to a number. You could do this.

import pandas as pd

df = pd.read_csv('file.csv', sep=';')

def get_cost(df):
    t_zone = 720
    max_rate = 5.5
    rate = 0.0208
    duration = df['t1']
    ratecol = []
    for i in duration:
        if i < t_zone:
            if(i * rate) >= max_rate:
                ratecol.append(max_rate)
            else:
                ratecol.append(i * rate)
        else:
            if i >= 720:
                x = int(i/720)
                y = ((i%720) * rate)
                if y >= max_rate:
                    ratecol.append((x * max_rate) + max_rate)
                else:
                    ratecol.append((x * max_rate) + y)
    return ratecol
df['cost'] = get_cost(df)

This code produces exact same result as the one posted before.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.