0

I want to use FunctionTransformer to perform calculations between columns. For instance, I want to substract two columns and add the the new column to the dataset. So I have the function:

def diff(x, y):
    return x - y

my initial dataset is:

X = pd.DataFrame({"product":["a","b","c","d"], "ndp":[100,200,150,120], "discount":[5,10,15,30]})

  product  ndp  discount
0       a  100         5
1       b  200        10
2       c  150        15
3       d  120        30

and I need a new column price = ndp - discount, so I run:

from sklearn.preprocessing import FunctionTransformer

transf = FunctionTransformer(diff, kw_args={'x': 'ndp', 'y':"discount"})
func_transf.transform(X)

but I get an error:

TypeError: diff() got multiple values for argument 'x'

How can I pass the arguments to the function diff and how to specify the name of the new column?

1 Answer 1

1

Note that FunctionTransformer, like most scikit-learn classes, is designed to work with NumPy arrays, rather than whole dataframes. So your function should also take in and return a NumPy array.

For example:

import numpy as np
import pandas as pd
from sklearn.preprocessing import FunctionTransformer

def diff(X: np.array) -> np.array:
    d = (X[:, 0] - X[:, 1]).reshape(-1, 1)
    return np.concatenate((X, d), axis=1)

X = pd.DataFrame({"product": ["a", "b", "c", "d"], 
                  "ndp": [100, 200, 150, 120], 
                  "discount": [5, 10, 15, 30]})

transf = FunctionTransformer(diff) 
X_new = transf.transform(X[['ndp', 'discount']].values)

print(X_new)
[[100   5  95]
 [200  10 190]
 [150  15 135]
 [120  30  90]]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.