2

I'm trying to get the Seaborn kdeplot example to work on my dataset. For some reason, one of my datasets isn't plotting at all, but the other seems to be plotting fine. To get a minimal working example, I have sampled only 10 rows from my very large data sets.

My input data looks like this:

#Dataframe dfA
    index   x       y     category
0   595700  5   1.000000    14.0
1   293559  4   1.000000    14.0
2   562295  3   0.000000    14.0
3   219426  4   1.000000    14.0
4   592731  2   1.000000    14.0
5   178573  3   1.000000    14.0
6   553156  4   0.500000    14.0
7   385031  1   1.000000    14.0
8   391681  3   0.999998    14.0
9   492771  2   1.000000    14.0

# Dataframe dfB
    index   x      y      category
0   56345   3   1.000000    6.0
1   383741  4   1.000000    6.0
2   103044  2   1.000000    6.0
3   297357  5   1.000000    6.0
4   257508  3   1.000000    6.0
5   223600  2   0.999938    6.0
6   44530   2   1.000000    6.0
7   82925   3   1.000000    6.0
8   169592  3   0.500000    6.0
9   229482  4   0.285714    6.0

My code snippet looks like this:

import seaborn as sns
import matplotlib.pyplot as plt

sns.set(style="darkgrid")

# Set up the figure
f, ax = plt.subplots(figsize=(8, 8))

# Draw the two density plots
ax = sns.kdeplot(dfA.x, dfA.y,
             cmap="Reds", shade=True, shade_lowest=False)
ax = sns.kdeplot(dfB.x, dfB.y,
             cmap="Blues", shade=True, shade_lowest=False)

Why isn't the data from dataframe dfA actually plotting?

2
  • 2
    Are you only creating one axes-object and plot both into the same (or even plotting figure-oriented without some axes)? What about f, axarr = plt.subplots(2) + sns.kdeplot(dfA.x, dfA.y, cmap="Reds", shade=True, shade_lowest=False, ax=axarr[0]) + sns.kdeplot(dfB.x, dfB.y, cmap="Blues", shade=True, shade_lowest=False, ax=axarr[1]) Commented Aug 24, 2016 at 1:56
  • 1
    I'm trying to plot both on the same axis. But dfA doesn't plot even if I comment out the second plot comments Commented Aug 24, 2016 at 4:57

1 Answer 1

3

I don't think gaussian KDE is a good fit for either of your datasets. You have one variable with discrete values and one variable where the large majority of values seem to be a constant. This is not well modeled by a bivariate gaussian distribution.

As for what exactly is happening, without the full dataset I cannot say for sure, but I expect that the KDE bandwidth (particularly on the y axis) is ending up very very narrow such that regions with non-negligible density are tiny. You could try setting a wider bandwidth, but my advice would be to use a different kind of plot for this data.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.