I have a pandas dataframe and I want to plot slices of it, in a function using multiprocessing. Even though the function "process_expression" works when I call it independently, when I use the "multiprocessing" option it is not giving any plots.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import scipy
import seaborn as sns
import sys
from multiprocessing import Pool
import os
os.system("taskset -p 0xff %d" % os.getpid())
pool = Pool()
gn = pool.map(process_expression, gene_ids)
pool.close()
pool.join()
def process_expression(gn_name, df_gn=df_coding):
df_part = df_gn.loc[df_gn['Gene_id'] == gn_name]
df_part = df_part.drop('Gene_id', 1)
df_part = df_part.drop('Transcript_biotype', 1)
COUNT100 = df_part[df_part >100 ].count()
COUNT10 = (df_part[df_part >10 ].count()) - COUNT100
COUNT1 = (df_part[df_part >1].count())- COUNT100 - COUNT10
COUNT0 = (df_part[df_part >0].count())- COUNT100-COUNT10- COUNT1
result = pd.concat([COUNT0,COUNT1,COUNT10,COUNT100], axis=1)
result.columns = [ '0 TO 1', '1 TO 10','10 TO 100', '>100']
result.plot( kind='bar', figsize=(50, 20), fontsize=7, stacked=True)
plt.savefig('./expression_levels/all_genes/'+gn_name+'.png')#,bbox_inches='tight')
plt.close()
the df_coding table is something like (it has more columns, I erased some):
Isoform_name,heart,heart.1,lung.3,Gene_id,Transcript_biotype
ENST00000296782,0.14546900000000001,0.161245,0.09479889999999999,ENSG00000164327,protein_coding
ENST00000357387,6.53902,5.86969,7.057689999999999,ENSG00000164327,protein_coding
ENST00000514735,0.0,0.0,0.0,ENSG00000164327,protein_coding
The input dataframe df_coding is a dataframe with a column Gene_id. In this column I have a list of gn_name. What I want is to take each time only the parts of the dataframe which have the name gn_name[i] in the Gene_id column and plot a barplot based on this dataframe.
For example if I call the 'process_expression('ENSG00000164327')', which is a specific gn_name, the output is something like this:

What am I doing wrong? I know that the process stops at the plotting command when I run it with multiprocessing.