Discrepancy in array value when calling a variable from different file - python

Question

I am facing a little problem while calling a variable from a different file. I have two different files train_dataset.py and test_dataset.py. I run the train_dataset.py file from my IDE and note the value of the array variable array_val as given below.

array([[ 0.08695652,  0.66459627,  0.08695652,  0.07453416,  0.07453416,
        ... 0.15217391]])

Now I switch on to test_dataset.py and import import train_dataset and print the value of array_val by calling train_dataset.array_val, I see a very different output. The output is given below.

    array([[  8.11594203e-01,   1.15942029e-01,   4.05797101e-01,
            ... 1.30434783e-01,   5.65217391e-01,   2.02898551e-01]])

Please suggest how do I get rid of it and state the reason of the discrepancy.

Please find the code that I have embedded in my train_dataset.py

no_of_clusters=9
cluster_centroids=[]
k_means=KMeans(n_clusters=no_of_clusters,n_init=14, max_iter=400)

k_means.fit(matrix_for_cluster)

labels=k_means.labels_
array_val=k_means.cluster_centers_

i.e matrix_for_cluster is a numpy n-dimensional array.

In my test_dataset.py all I do is

import train_dataset
print train_dataset.array_val

I have updated the code, Please have a look

Sam
– Sam

2015-04-20 12:52:55 +00:00
Commented Apr 20, 2015 at 12:52 — Sam
– Sam, Commented Apr 20, 2015 at 12:52

YXD · Accepted Answer · 2015-04-20 14:06:46Z

3

This is probably due to the random initialization of the k-means algorithm

http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

As @ali_m explains nicely in the comments, the line import train_dataset re-runs the clustering and the cluster centers are not actually saved from the previous time you ran the code. To do that you can serialise the data with

edited Apr 20, 2015 at 14:06

answered Apr 20, 2015 at 12:54

YXD

32.6k15 gold badges79 silver badges117 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Sam Over a year ago

Ya, But I run it only once. And then from a different shell I just call the variable that holds the array of centroid position. Do you mean when I call the variable the centroids are re-initialized.

YXD Over a year ago

Run it the same way twice. Do you get the same results?

Sam Over a year ago

No, Results are different for different run that I am aware of, but what I am unable to understand is that, when I run the model once and save the centroid points to a variable and call it from a different shell (train_dataset.array_var) it shows different output, but when I run just array_var or in the same shell it gives the same output.

YXD Over a year ago

Try adding the line print array_val after array_val=k_means.cluster_centers_ in train_dataset.py

ali_m Over a year ago

@user2404193 You are not really "saving" the centroids by assigning them to some variable in your script, since the clustering will still be re-run every time you import or reload your train_dataset script. To truly "save" the results, you should write them to some external file (e.g. using np.save).

Collectives™ on Stack Overflow

Discrepancy in array value when calling a variable from different file - python

1 Answer 1

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related