1

I am facing a little problem while calling a variable from a different file. I have two different files train_dataset.py and test_dataset.py. I run the train_dataset.py file from my IDE and note the value of the array variable array_val as given below.

array([[ 0.08695652,  0.66459627,  0.08695652,  0.07453416,  0.07453416,
        ... 0.15217391]])

Now I switch on to test_dataset.py and import import train_dataset and print the value of array_val by calling train_dataset.array_val, I see a very different output. The output is given below.

    array([[  8.11594203e-01,   1.15942029e-01,   4.05797101e-01,
            ... 1.30434783e-01,   5.65217391e-01,   2.02898551e-01]])

Please suggest how do I get rid of it and state the reason of the discrepancy.

Please find the code that I have embedded in my train_dataset.py

no_of_clusters=9
cluster_centroids=[]
k_means=KMeans(n_clusters=no_of_clusters,n_init=14, max_iter=400)

k_means.fit(matrix_for_cluster)

labels=k_means.labels_
array_val=k_means.cluster_centers_

i.e matrix_for_cluster is a numpy n-dimensional array.

In my test_dataset.py all I do is

import train_dataset
print train_dataset.array_val
1
  • I have updated the code, Please have a look Commented Apr 20, 2015 at 12:52

1 Answer 1

3

This is probably due to the random initialization of the k-means algorithm

As @ali_m explains nicely in the comments, the line import train_dataset re-runs the clustering and the cluster centers are not actually saved from the previous time you ran the code. To do that you can serialise the data with

Sign up to request clarification or add additional context in comments.

5 Comments

Ya, But I run it only once. And then from a different shell I just call the variable that holds the array of centroid position. Do you mean when I call the variable the centroids are re-initialized.
Run it the same way twice. Do you get the same results?
No, Results are different for different run that I am aware of, but what I am unable to understand is that, when I run the model once and save the centroid points to a variable and call it from a different shell (train_dataset.array_var) it shows different output, but when I run just array_var or in the same shell it gives the same output.
Try adding the line print array_val after array_val=k_means.cluster_centers_ in train_dataset.py
@user2404193 You are not really "saving" the centroids by assigning them to some variable in your script, since the clustering will still be re-run every time you import or reload your train_dataset script. To truly "save" the results, you should write them to some external file (e.g. using np.save).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.