2

I am trying to perform some clustering analysis using three different clustering algorithms. I am loading in data from stdin as follows

import sklearn.cluster as cluster

X = []
for line in sys.stdin:
    x1, x2 = line.strip().split()
    X.append([float(x1), float(x2)])
X = numpy.array(X)

and then storing my clustering parameters and types in an array as such

clustering_configs = [
    ### K-Means
    ['KMeans', {'n_clusters' : 5}],
    ### Ward
    ['AgglomerativeClustering', {
                'n_clusters' : 5,
                'linkage' : 'ward'
                }],
    ### DBSCAN
    ['DBSCAN', {'eps' : 0.15}]
]

And I am trying to call them in a for loop

for alg_name, alg_params in clustering_configs:

    class_ = getattr(cluster, alg_name)
    instance_ = class_(alg_params)

    instance_.fit_predict(X)

Everything is working correctly except for the instance_.fit_prefict(X) function. I am getting returned an error

Traceback (most recent call last):
  File "meta_cluster.py", line 47, in <module>
    instance_.fit_predict(X)
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/k_means_.py", line 830, in fit_predict
    return self.fit(X).labels_
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/k_means_.py", line 812, in fit
    X = self._check_fit_data(X)
  File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/k_means_.py", line 789, in _check_fit_data
    X.shape[0], self.n_clusters))
TypeError: %d format: a number is required, not dict

Anyone have a clue where I could be going wrong? I read the sklearn docs here and it claims you just need an array-like or sparse matrix, shape=(n_samples, n_features) which I believe I have.

Any suggestions? Thanks!

1 Answer 1

2
 class sklearn.cluster.KMeans(n_clusters=8, init='k-means++', n_init=10, max_iter=300, tol=0.0001, precompute_distances='auto', verbose=0, random_state=None, copy_x=True, n_jobs=1, algorithm='auto')[source]

They way you'd call the KMeans class is,

KMeans(n_clusters=5)

With your current code you are calling

KMeans({'n_clusters': 5})

which is causing alg_params to be passed as a Dict instead of a class parameter. Same goes for the other algorithms.

Sign up to request clarification or add additional context in comments.

4 Comments

Is there an easy way to bring those values out of the dictionary and into the necessary format?
@wKavey: KMeans(**{'n_clusters': 5})
So in my case instance_ = class_(**alg_params)?
Yep, should work (as long as the alg_params doesn't contain any keys that aren't kwargs for the function/class).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.