I write my own function in Python. The function is very simple and below you can see data and function:
data_1 = {'id':['1','2','3','4','5'],
'name': ['Company1', 'Company1', 'Company3', 'Company4', 'Company5'],
'employee': [10, 3, 5, 1, 0],
'sales': [100, 30, 50, 200, 0],
}
df = pd.DataFrame(data_1, columns = ['id','name', 'employee','sales'])
threshold_1=40
threshold_2=50
And the function is written below:
def my_function(employee,sales):
conditions = [
(sales == 0 ),
(sales < threshold_1),
(sales >= threshold_1 & employee <= threshold_2)]
values = [0, sales*2, sales*4]
sales_estimation = np.select(conditions, values)
return (sales_estimation)
df['new_column'] = df.apply(lambda x: my_function(x.employee,x.sales), axis=1)
df
So this function works well and gives the expected result.
Now I want to make the same function but with vectorized operation across Pandas Series. I need to have this function because vectorized operation decreases the time for executing. For this reason, I wrote this function but the function is not working.
def my_function1(
pandas_series:pd.Series
)-> pd.Series:
"""
Vectorized operation across Pandas Series
"""
conditions = [
(sales == 0 ),
(sales < threshold_1),
(sales >= threshold_1 & employee <= threshold_2)]
values = [0, sales*2, sales*4]
sales_estimation = np.select(conditions, values)
return sales_estimation
df['new_column_1']=my_function1(data['employee','sales'])
Probably my error is related to the input parameters of this function. So can anybody help me how to solve this problem and make my_function1 functional?