I have a DataFrame that consists only of non-numeric data. I have two exemplary sequences:
seq_1 = 'AATGMAM'
seq_2 = 'TATAMTM'
Where one of these sequences is used as columns, and the second one as the index. If the letter overlaps, the DataFrame is filled with a '*' sign. Otherwise, the value is ''.
data = [["*" if p1 == p2 else "" for p2 in seq_2] for p1 in seq_1]
df = pd.DataFrame(data, columns=list(seq_1), index=list(seq_2)
A A T G M A M
T *
A * * *
T *
A * * *
M
T *
M * *
Now I want to create a scatter plot that would depict this DataFrame. The x-axis should be the index and y-axis the columns. How can I do that?
EDIT: Thanks to @Shaido, I was able to plot it. However I have issues with adding a separate color for each label.
color_keys = max([np.unique(df.index), np.unique(df.columns)], key=len)
rgb_values = sns.color_palette("Set2", len(color_keys))
colors = dict(zip(color_keys, rgb_values))
for g in np.unique(df.index):
ix = np.where(df.columns == g)
plt.scatter(x[ix], y[ix], c = colors[g], label = g, s = 100)
plt.xticks(np.arange(df.shape[0]), df.index)
plt.yticks(np.arange(df.shape[1]), df.columns)
plt.legend()
plt.show()
How can I map each unique value to a separate column?
