I have the following data frame in pandas:
>>> df1[1:15]
gene beta
1 PALMD NaN
2 PALMD NaN
3 FRRS1 1.966503
4 AGL NaN
5 AGL -4.082453
6 AGL 2.840288
7 AGL NaN
8 AGL -4.909043
9 AGL NaN
10 AGL 3.275433
11 SASS6 NaN
12 SASS6 -3.239315
13 TRMT13 3.434759
14 TRMT13 4.282222
I would like to create a variable which will indicate if all of the beta values for each gene are are(1) all positive betas for that gene, (2) all negative betas, or (3) mixed. I will discard NaN unless they are the only type for a given gene. This is the goal:
>>> df1[1:15]
gene Direction
1 PALMD NaN
2 FRRS1 Pos
3 AGL Mix
4 SASS6 Neg
5 TRMT13 Pos
I tried to aggregate by gene but I got an error, possibly due to the NaN. If possible I would like to keep the output as a pandas data frame since I will have to merge this to another df in the future
>>> df1g = df1.groupby("gene")
>>> df1ga = df1g.agg(np.concatenate)
KeyError: 0L
Thank you