10

I have a dataFrame which has multiple columns and many rows..Many row has no value for column so in the data frame its represented as NaN. The example dataFrame is as follows,

df.head()
GEN Sample_1    Sample_2    Sample_3    Sample_4    Sample_5    Sample_6    Sample_7    Sample_8    Sample_9    Sample_10   Sample_11   Sample_12   Sample_13   Sample_14
A123    9.4697  3.19689 4.8946  8.54594 13.2568 4.93848 3.16809 NAN NAN NAN NAN NAN NAN NAN
A124    6.02592 4.0663  3.9218  2.66058 4.38232         NAN NAN NAN NAN NAN NAN NAN
A125    7.88999 2.51576 4.97483 5.8901  21.1346 5.06414 15.3094 2.68169 8.12449 NAN NAN NAN NAN NAN
A126    5.99825 10.2186 15.2986 7.53729 4.34196 8.75048 16.9358 5.52708 NAN NAN NAN NAN NAN NAN
A127    28.5014 4.86702 NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN NAN

I wanted to plot histogram for this dataFrame using seaborn function from python and so i was trying the following lines,

sns.set(color_codes=True)
sns.set(style="white", palette="muted")
sns.distplot(df)

But its throwing the following error,

    ValueError                                Traceback (most recent call last)
    <ipython-input-80-896d7fe85ef3> in <module>()
          1 sns.set(color_codes=True)
          2 sns.set(style="white", palette="muted")
    ----> 3 sns.distplot(df)

    /anaconda3/lib/python3.4/site-packages/seaborn/distributions.py in distplot(a, bins, hist, kde, rug, fit, hist_kws, kde_kws, rug_kws, fit_kws, color, vertical, norm_hist, axlabel, label, ax)
        210         hist_color = hist_kws.pop("color", color)
        211         ax.hist(a, bins, orientation=orientation,
    --> 212                 color=hist_color, **hist_kws)
        213         if hist_color != color:
        214             hist_kws["color"] = hist_color

   /anaconda3/lib/python3.4/site-packages/matplotlib/axes/_axes.py in hist(self, x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, **kwargs)
       5627             color = mcolors.colorConverter.to_rgba_array(color)
       5628             if len(color) != nx:
    -> 5629                 raise ValueError("color kwarg must have one color per dataset")
       5630 
       5631         # We need to do to 'weights' what was done to 'x'

    ValueError: color kwarg must have one color per dataset

Any helps/suggestions to get rid of this error would be greatly appreciated..!!!

5
  • Well, obviously histogram function mapping for 2D array in general case isn't defined. As you could see, distplot takes 1D array, Series or list. You might try to pass color=X where X is dictionary of color mapping, e.g. {'Sample_1': 'Red', ...}, but I seriously doubt it will work. Commented Oct 3, 2015 at 14:15
  • ok, can we use it with seaborn, it would be nice if you could share it here..I am beginner in seaborn plotting.. Commented Oct 3, 2015 at 14:57
  • I'd suggest you to avoid searching for 1-line solution to you problems. Start with matplotlib (seaborn is just set of advanced tools working over the matplotlib). For your task, allocate array of subplots (plt.subplots(nrows=?, ncols=?)), iterate over df columns and call matplotlib's hist for each pair subplot + column. Commented Oct 3, 2015 at 15:04
  • It's not clear what you're asking. Do you want a single histogram for all values in the dataframe? A separate histogram for each column, or for each row? What you're asking is currently undefined, which is why you are seeing an error. Commented Oct 3, 2015 at 16:11
  • @user1017373 can you please edit the question? I understand the question only once I see the accepted answer, but in its current form your question is very unclear. Commented Nov 28, 2015 at 21:12

3 Answers 3

5

I had also thought the seaborn documentation mentioned that multiple columns could be plotted simultaneously, and highlighted by color by default.

But upon re-reading, I did not see anything. Instead, I think I inferred it from this tutorial, where part of the way through, the tutorial plots a data frame with multiple columns.


However, the "solution" is trivial, and hopefully exactly what you're looking for:

sns.set(color_codes=True)
sns.set(style="white", palette="muted")
sns.distplot(df)

for col_id in df.columns:
    sns.distplot(df[col_id])

By default, this will alter the colors, "knowing" which one has already been used.

Generated image from code above (using different data set)

Note: I used a different data set, since I wasn't sure how to re-create yours.

Sign up to request clarification or add additional context in comments.

Comments

5

I had similar problem because my pandas.DataFrame had elements of type Object in a column I wanted to plot (my_column). So that the command:

print(df[my_column])

gave me:

Length: 150, dtype: object

The solution was

sns.distplot(df[my_column].astype(float))

As the datatype of my_column transformed to:

Length: 150, dtype: float64

enter image description here

Comments

4

Let's assume I have the excerpt from the data you have showed above (with only difference that on my machine NAN is NaN).

Then, the best graphical representation I can think of is grouped barplot: one group for every sample, within every group there are gene bars (some people call this histogram occasionally)

In order to do that, you need first to "melt" your data, in R parlour, i.e. make it "long". Then, you can proceed with plotting.

data = df.set_index('GEN').unstack().reset_index()
data.columns = ['sample','GEN', 'value']

sns.set(style="white")
g = sns.factorplot(x='sample'
                   ,y= 'value'
                   ,hue='GEN'
                   ,data=data
                   ,kind='bar'
                   ,aspect=2
                   )
g.set_xticklabels(rotation=30);

enter image description here

Please, let us know if this is the type of plot you were after.

1 Comment

I was looking for Histogram to plot the distribution, however thank you

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.