0

I have three dimensional data, where one dimension is categorical: length, width, target. For simplicity, say that target can take values in {0, 1, 2}. I would like to plot length vs width "by" target. The points will have different colours and shapes depending on the target value.

I am able to do this in matplotlib.pyplot, imported as plt, using the following syntax. I assume that a pandas DataFrame df has the structure I imposed.

X0 = df.query("target == 0.0").drop("target", axis = 1)
X1 = df.query("target == 1.0").drop("target", axis = 1)
X2 = df.query("target == 2.0").drop("target", axis = 1)

ax = plt.axes()
X0.plot(x = "length", y = "width", kind = "scatter", ax = ax, color = "red")
X1.plot(x = "length", y = "width", kind = "scatter", ax = ax, color = "blue")
X2.plot(x = "length", y = "width", kind = "scatter", ax = ax, color = "green")
plt.show()

I'm sure that we can all agree that this is bbaaaddd.

A few years ago, I used to do some programming in R. The ggplot2 package allowed a syntax of the form

ggplot(df, x = length, y = width, shape = target).geom_point().

One could replace shape = target with colour = target to get different colours depending on the value of target.

I would like something similar in pyplot. Try as I might, I have not been able to find such information in documentation or online sources. I'm sure it must be out there somewhere. I just have not been able to find it...


Edit. This question was marked as a duplicate. The duplicates were helpful in solving some of the issues, but they do not resolve all the questions raised above. In particular, shapes are not discussed. The closest that I have found is the following question: How to change the shape of the marker depending on a column variable?. There are other similar questions. But this is pretty ugly compared with a simple shape = "target" call.

There is a "ggplot for python" package, called plotnine, but it doesn't seem to have been updated for 5 years. You also seem to need do stuff like from plotnine import *, which I'm certainly not excited by.

Maybe the functionality I'm after just doesn't exist in pyplot. If so, such is life! :)


Edit. @Trenton McKinney suggests using seaborn, imported as sns. This has a hue option, which does precisely the different colouring.

sns.scatterplot(data = df, x = "length", y = "width", hue = "target")

This still doesn't answer my question about shapes---neither did the (partial) "duplicates". However, sns.scatterplot also has a style option, which has the same description as hue except "different colours" is replaced by "different markers".

sns.scatterplot(data = df, x = "length", y = "width", style = "target")

Why not go crazy and use both hue and style!

I guess that the correct answer is "don't do it in matplotlib; do it in seaborn". Hopefully the incorrect marking as duplicate will be resolved, then I can add an answer with the full details.

0

1 Answer 1

0

How about that:

for target in [0.0, 1.0, 2.0]:
    df.query("target == " + str(target)).drop("target", axis = 1).plot(x = 
        "length", y = "width", kind = "scatter")
plt.show()
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.