2

I am working on a clustering algorithm and need for all points in my scatter plot that belong to the same cluster to be marked the same color. I have a list which indicates for each point which cluster that point belongs to, marked with an integer 0...k where k is the number of clusters. I would like to know how to map this list to colors (preferably as many colors as the number of clusters in the clustering algorithm which is known beforehand). I am working with matplotlib in python and am completely lost as to how to solve this problem.

plt.scatter([item[0] for item in dataset],[item[1] for item in dataset],color='b')
plt.scatter([item[0] for item in centroids_list],[item[1] for item in centroids_list],color='r)

plt.show()

Right now this is all I have wherein the cluster points are indicated in blue and the centroids in red. I would like to leave the centroids red and only change the color of the points in the dataset such that points of the same cluster have the same color. I am lost in the sea that is the matplotlib library and would really appreciate any help.

Thanks in advance!

3 Answers 3

1

See the color parameter at the pyplot.scatter documentation.

Basically, you need to separate your data up into clusters, and then call pyplot.scatter in a loop, each with a different item as the color parameter.

You can use vq from scipy.cluster to assign your data to clusters using your centroids, like so:

    assignments = vq( dataset, centroids_list )[0]
    clusters = [[] for i in range( len( assignments ) )
    for item, clustNum in zip( dataset, assignments ):
        clusters[clustNum].append( item )

At least this is how I've done it before if I'm remembering correctly. From there it's just defining a function to return a random color, and then:

    for cluster in clusters:
        plt.scatter([item[0] for item in cluster],[item[1] for item in cluster],color=randomColor() ) 
Sign up to request clarification or add additional context in comments.

1 Comment

If you have a lot of clusters, you may also want to look into using the marker parameter as well to make it easier to differentiate.
1

if you use numpy arrays you can simplify slicing and if you pass to color param clusters label it should work fine:

plt.scatter(item[:, 0], item[:, 1], color=clusters)
plt.scatter(centroids_list[:, 0], centroids_list[:, 1], s=70, c='r')

and you can use meshgrid together with plt.imshow to add colorfull background as in examle here

Comments

0

If you have numpy arrays, you should be able to use dataset[:,0] to access the first column much more efficiently.

I found scatter to behave odd sometimes (at least in ipython notebook), but the plot function can do this, too.

i = 0
markers = matplotlib.lines.Line2D.markers.keys()
colors = list("bgrcmyk")
for cluster in clusters:
  marker, color = markers[i % len(markers)], colors[i % len(colors)]
  plt.plot(cluster[:,0],cluster[:,1],marker+color)
  i += 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.