I want to make clustering model for my dataset with Python and Scikit-Learn lib. Dataset contains continues and categorical values. I have encoded categorical values but when I want to scale the features I'm getting this error:
"Cannot center sparse matrices: pass `with_mean=False` "
ValueError: Cannot center sparse matrices: pass `with_mean=False` instead. See docstring for motivation and alternatives.
I'm getting that error in this line:
features = scaler.fit_transform(features)
What am I doing wrong?
This is my code:
features = df[['InvoiceNo', 'StockCode', 'Description', 'Quantity',
'UnitPrice', 'CustomerID', 'Country', 'Total Price']]
columns_for_scaling = ['InvoiceNo', 'StockCode', 'Description', 'Quantity', 'UnitPrice', 'CustomerID', 'Country', 'Total Price']
transformerVectoriser = ColumnTransformer(transformers=[('Encoding Invoice number', OneHotEncoder(handle_unknown = "ignore"), ['InvoiceNo']),
('Encoding StockCode', OneHotEncoder(handle_unknown = "ignore"), ['StockCode']),
('Encoding Description', OneHotEncoder(handle_unknown = "ignore"), ['Description']),
('Encoding Country', OneHotEncoder(handle_unknown = "ignore"), ['Country'])],
remainder='passthrough') # Default is to drop untransformed columns
features = transformerVectoriser.fit_transform(features)
print(features.shape)
scaler = StandardScaler()
features = scaler.fit_transform(features)
sum_of_squared_distances = []
for k in range(1,16):
kmeans = KMeans(n_clusters=k)
kmeans = kmeans.fit(features)
sum_of_squared_distances.append(features.inertia_)
Shape of my data before preprocessing: (401604, 8)
Shape of my data after preprocessing: (401604, 29800)
with_mean=Falsein the scaler.