8

I am carrying out clustering and try to plot the result. A dummy data set is :

data

import numpy as np

X = np.random.randn(10)
Y = np.random.randn(10)
Cluster = np.array([0, 1, 1, 1, 3, 2, 2, 3, 0, 2])    # Labels of cluster 0 to 3

cluster center

 centers = np.random.randn(4, 2)    # 4 centers, each center is a 2D point

Question

I want to make a scatter plot to show the points in data and color the points based on the cluster labels.

Then I want to superimpose the center points on the same scatter plot, in another shape (e.g. 'X') and a fifth color (as there are 4 clusters).


Comment

  • I turned to seaborn 0.6.0 but found no API to accomplish the task.
  • ggplot by yhat could made the scatter plot nice but the second plot would replace the first one.
  • I got confused by the color and cmap in matplotlib so I wonder if I could use seaborn or ggplot to do it.
1
  • Could be more specific on Then I want to superimpose the center points on the same scatter plot, in another shape (e.g. 'X') and a fifth color (as there are 4 clusters). Commented Jun 30, 2015 at 11:39

2 Answers 2

13

The first part of your question can be done using colorbar and specifying the colours to be the Cluster array. I have vaguely understood the second part of your question, but I believe this is what you are looking for.

import numpy as np
import matplotlib.pyplot as plt

x = np.random.randn(10)
y = np.random.randn(10)
Cluster = np.array([0, 1, 1, 1, 3, 2, 2, 3, 0, 2])    # Labels of cluster 0 to 3
centers = np.random.randn(4, 2) 

fig = plt.figure()
ax = fig.add_subplot(111)
scatter = ax.scatter(x,y,c=Cluster,s=50)
for i,j in centers:
    ax.scatter(i,j,s=50,c='red',marker='+')
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.colorbar(scatter)

fig.show()

which results in:

enter image description here

wherein your "centres" have been shown using + marker. You can specify any colours you want to them in the same way have done for x and y

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much. I think I should read through the documentation of matplotlib.
2

Part of this has been answered here. The outline is

plt.scatter(x, y, c=color)

Quoting the documentation of matplotlib:

c : color or sequence of color, optional, default [...] Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. c can be a 2-D array in which the rows are RGB or RGBA, however.

So in your case, you need a color for each cluster and than fill the color array according to the cluster assignment of each point.

red = [1, 0, 0]
green = [0, 1, 0]
blue = [0, 0, 1]
colors = [red, red, green, blue, green]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.