0

I write my own function in Python. The function is very simple and below you can see data and function:

  data_1 = {'id':['1','2','3','4','5'],
            'name': ['Company1', 'Company1', 'Company3', 'Company4', 'Company5'], 
            'employee': [10, 3, 5, 1, 0], 
            'sales': [100, 30, 50, 200, 0], 
           }
    df = pd.DataFrame(data_1, columns = ['id','name', 'employee','sales'])
    
    threshold_1=40
    threshold_2=50

And the function is written below:

  def my_function(employee,sales):
        conditions = [
        (sales == 0 ),
        (sales < threshold_1), 
        (sales >= threshold_1 & employee <= threshold_2)]
        values = [0, sales*2, sales*4]
        sales_estimation = np.select(conditions, values)    
        return (sales_estimation)

df['new_column'] = df.apply(lambda x: my_function(x.employee,x.sales), axis=1)
df

So this function works well and gives the expected result.

Now I want to make the same function but with vectorized operation across Pandas Series. I need to have this function because vectorized operation decreases the time for executing. For this reason, I wrote this function but the function is not working.

  def my_function1(
        pandas_series:pd.Series
        )-> pd.Series:
        """
        Vectorized operation across Pandas Series
        """
        conditions = [
        (sales == 0 ),
        (sales < threshold_1), 
        (sales >= threshold_1 & employee <= threshold_2)]
        values = [0, sales*2, sales*4]
        sales_estimation = np.select(conditions, values)    
        return sales_estimation
    
    df['new_column_1']=my_function1(data['employee','sales'])

Probably my error is related to the input parameters of this function. So can anybody help me how to solve this problem and make my_function1 functional?

2 Answers 2

1

You need to slightly change one condition to be able to pass Series:

(sales >= threshold_1 & employee <= threshold_2)
# equivalent to
# sales >= (threshold_1 & employee) <= threshold_2

into:

(sales >= threshold_1) & (employee <= threshold_2)

as the operator precedence was incorrect.

def my_function(employee,sales):
        conditions = [
        (sales == 0 ),
        (sales < threshold_1), 
        (sales >= threshold_1) & (employee <= threshold_2)]
        values = [0, sales*2, sales*4]
        sales_estimation = np.select(conditions, values)    
        return (sales_estimation)

df['new_column'] = my_function(df['employee'], df['sales'])

output:

  id      name  employee  sales  new_column
0  1  Company1        10    100         400
1  2  Company1         3     30          60
2  3  Company3         5     50         200
3  4  Company4         1    200         800
4  5  Company5         0      0           0

You can also pass the whole dataframe ans subset the columns there:

def my_function(df):
    employee = df['employee']
    sales = df['sales']
    conditions = [
    (sales == 0 ),
    (sales < threshold_1), 
    (sales >= threshold_1) & (employee <= threshold_2)]
    values = [0, sales*2, sales*4]
    sales_estimation = np.select(conditions, values)    
    return (sales_estimation)

df['new_column'] = my_function(df)
Sign up to request clarification or add additional context in comments.

3 Comments

Does this function convert the dataset into Pandas series? I am asking this because I need to have vectorized series
@silent_hunter this code processes the data in a vectorial way, it outputs a numpy array, but assignment to the dataframe transforms it into a Series automatically
@silent_hunter it should be much faster than the apply approach if this is your concern ;)
1

Pass Series to function like and also add () for avoid ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). because priority of operators:

def my_function1(employee, sales):
      conditions = [
      (sales == 0 ),
      (sales < threshold_1), 
      (sales >= threshold_1) & (employee <= threshold_2)] #<- here
      values = [0, sales*2, sales*4]
      sales_estimation = np.select(conditions, values)    
      return sales_estimation
    
df['new_column_1']= my_function1(df['employee'],df['sales'])
print (df)
  id      name  employee  sales  new_column_1
0  1  Company1        10    100           400
1  2  Company1         3     30            60
2  3  Company3         5     50           200
3  4  Company4         1    200           800
4  5  Company5         0      0             0

4 Comments

Does this function convert dataset into Pandas series? I am asking this because I need to have vectorized series
@silent_hunter - If select column like df['employee'] it is Series. Same for sales
@silent_hunter - you can check it print (type(df['employee']))
@silent_hunter - aand in np.select are converted values to numpy.arrays (Series is built by arrays), so working fast and after assign to new column arrays is converted to new column. And it is Series if select like df['new_column_1']

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.