0

I have a list of case and control samples along with the information about what characteristics are present or absent in each of them. A dataframe including the information can be generated by Pandas:

import pandas as pd
df={'Patient':[True,True,False],'Control':[False,True,False]} # Presence/absence data for three genes for each sample 
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']

I need to visualize this data as a dotplot/scatterplot in the way that both of the x and y axis to be categorical and presence/absence to be coded by different shapes. Something like following:

Patient|  x      x     -
Control|  -      x     -  
       __________________
        GeneA  GeneB  GeneC

I am new to Matplotlib/seaborn and I can plot simple line plots and scatter plots. But searching online I could not find any instructions or plot similar to what I need here.

6
  • 1
    you might want to change d to df in line 3 and 4 ? Otherwise I think this is an interesting question. I do not know why people downvoted it. Commented Jun 28, 2018 at 20:47
  • @Moritz. Thanks for comment. I am also wondering what is wrong with this question!!! Commented Jun 28, 2018 at 20:49
  • I wish people could explain what is wrong before downvoting!!! Commented Jun 28, 2018 at 20:56
  • 3
    @user3015703 It's likely gained down votes because it doesn't show what you've tried in order to come up with a solution on your own. Commented Jun 28, 2018 at 20:56
  • you might want to have a look at seaborn: seaborn.pydata.org/index.html It provides some neat features for plotting on data aware grids Commented Jun 28, 2018 at 21:03

3 Answers 3

5

A quick way would be:

import pandas as pd
import matplotlib.pyplot as plt

df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample 
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']

heatmap = plt.imshow(df)
plt.xticks(range(len(df.columns.values)), df.columns.values)
plt.yticks(range(len(df.index)), df.index)
cbar = plt.colorbar(mappable=heatmap, ticks=[0, 1], orientation='vertical')  
# vertically oriented colorbar
cbar.ax.set_yticklabels(['Absent', 'Present']) 

enter image description here

Thanks to @DEEPAK SURANA for adding labels to the colorbar.

Sign up to request clarification or add additional context in comments.

Comments

2

I searched the pyplot documentation and could not find a scatter or dot plot exactly like you described. Here is my take on creating a plot that illustrates what you want. The True records are blue and the False records are red.

# creating dataframe and extra column because index is not numeric
import pandas as pd
df={'Patient':[True,True,False],
    'Control':[False,True,False]} 
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']
df['level'] = [i for i in range(0, len(df))]
print(df)

# plotting the data
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,6))
for idx, gene in enumerate(df.columns[:-1]):
    df_gene = df[[gene, 'level']]
    cList = ['blue' if x == True else 'red' for x in df[gene]]
    for inr_idx, lv in enumerate(df['level']):
        ax.scatter(x=idx, y=lv, c=cList[inr_idx], s=20)
fig.tight_layout()
plt.yticks([i for i in range(len(df.index))], list(df.index))
plt.xticks([i for i in range(len(df.columns)-1)], list(df.columns[:-1]))
plt.show()

Figure 1

Comments

2

Something like this might work

import pandas as pd
import numpy as np
from matplotlib.ticker import FixedLocator

df={'Patient':[1,1,0],'Control':[0,1,0]} # Presence/absence data for three genes for each sample 
df=pd.DataFrame(df)
df=df.transpose()
df.columns=['GeneA','GeneB','GeneC']

plot = df.T.plot()
loc = FixedLocator([0,1,2])
plot.xaxis.set_major_locator(loc)
plot.xaxis.set_ticklabels(df.columns)

look at https://matplotlib.org/examples/pylab_examples/major_minor_demo1.html and https://matplotlib.org/api/ticker_api.html

I think you have to convert the boolean values to zeros and ones to make it work. Someting like df.astype(int)

2 Comments

I need values for each sample to be in one raw in the plot and presence/absence to be coded by different shapes. Isn't is possible to both x and y axis include categorical data?
you could try a heatmap

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.