sklean fit_predict not accepting a 2 dimensional numpy array

Question

I am trying to perform some clustering analysis using three different clustering algorithms. I am loading in data from stdin as follows

import sklearn.cluster as cluster

X = []
for line in sys.stdin:
    x1, x2 = line.strip().split()
    X.append([float(x1), float(x2)])
X = numpy.array(X)

and then storing my clustering parameters and types in an array as such

clustering_configs = [
    ### K-Means
    ['KMeans', {'n_clusters' : 5}],
    ### Ward
    ['AgglomerativeClustering', {
                'n_clusters' : 5,
                'linkage' : 'ward'
                }],
    ### DBSCAN
    ['DBSCAN', {'eps' : 0.15}]
]

And I am trying to call them in a for loop

for alg_name, alg_params in clustering_configs:

    class_ = getattr(cluster, alg_name)
    instance_ = class_(alg_params)

    instance_.fit_predict(X)

Everything is working correctly except for the instance_.fit_prefict(X) function. I am getting returned an error

Traceback (most recent call last):
  File "meta_cluster.py", line 47, in <module>
    instance_.fit_predict(X)
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/k_means_.py", line 830, in fit_predict
    return self.fit(X).labels_
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/k_means_.py", line 812, in fit
    X = self._check_fit_data(X)
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/k_means_.py", line 789, in _check_fit_data
    X.shape[0], self.n_clusters))
TypeError: %d format: a number is required, not dict

Anyone have a clue where I could be going wrong? I read the sklearn docs here and it claims you just need an array-like or sparse matrix, shape=(n_samples, n_features) which I believe I have.

Any suggestions? Thanks!

Akshar · Accepted Answer · 2016-09-29 20:21:47Z

2

 class sklearn.cluster.KMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=0.0001, precompute_distances='auto', verbose=0, random_state=None, copy_x=True, n_jobs=1, algorithm='auto')[source]

They way you'd call the KMeans class is,

KMeans(n_clusters=5)

With your current code you are calling

KMeans({'n_clusters': 5})

which is causing alg_params to be passed as a Dict instead of a class parameter. Same goes for the other algorithms.

answered Sep 29, 2016 at 20:21

Akshar

9579 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

wakey Over a year ago

Is there an easy way to bring those values out of the dictionary and into the necessary format?

Alex Riley Over a year ago

@wKavey: KMeans(**{'n_clusters': 5})

wakey Over a year ago

So in my case instance_ = class_(**alg_params)?

Alex Riley Over a year ago

Yep, should work (as long as the alg_params doesn't contain any keys that aren't kwargs for the function/class).

Collectives™ on Stack Overflow

sklean fit_predict not accepting a 2 dimensional numpy array

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related