Aggregating string field into a list with python pandas

Question

I have the following data frame in pandas:

>>> df1[1:15]
      gene      beta
1    PALMD       NaN
2    PALMD       NaN
3    FRRS1  1.966503
4      AGL       NaN
5      AGL -4.082453
6      AGL  2.840288
7      AGL       NaN
8      AGL -4.909043
9      AGL       NaN
10     AGL  3.275433
11   SASS6       NaN
12   SASS6 -3.239315
13  TRMT13  3.434759
14  TRMT13  4.282222

I would like to create a variable which will indicate if all of the beta values for each gene are are(1) all positive betas for that gene, (2) all negative betas, or (3) mixed. I will discard NaN unless they are the only type for a given gene. This is the goal:

>>> df1[1:15]
      gene   Direction
1    PALMD         NaN
2    FRRS1         Pos
3      AGL         Mix
4    SASS6         Neg
5   TRMT13         Pos

I tried to aggregate by gene but I got an error, possibly due to the NaN. If possible I would like to keep the output as a pandas data frame since I will have to merge this to another df in the future

>>> df1g = df1.groupby("gene")
>>> df1ga = df1g.agg(np.concatenate)
KeyError: 0L

Thank you

DSM · Accepted Answer · 2014-10-01 23:26:52Z

2

I'd write a little label function:

def label(ser):
    ser = ser.dropna()
    if ser.empty:
        return np.nan
    if (ser >= 0).all():
        return "Pos"
    if (ser < 0).all():
        return "Neg"
    return "Mix"

and then pass it to groupby.agg to make it easy to speciy the name:

>>> labelled = df.groupby("gene")["beta"].agg({"Direction": label}).reset_index()
>>> labelled
     gene Direction
0     AGL       Mix
1   FRRS1       Pos
2   PALMD       NaN
3   SASS6       Neg
4  TRMT13       Pos

answered Oct 1, 2014 at 23:26

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

b10n Over a year ago

dsm great answer. Whats the purpose of the dictionary? Is it to get the label on the generated column?

DSM Over a year ago

Yep. df.groupby("gene")["beta"].agg(label).reset_index(name="Direction") would have worked too, but that always feels a little magical to me.

alexhli Over a year ago

I am seeing this error when I implement your code: AttributeError: 'Series' object has no attribute 'empty'. Could this be a version thing?

alexhli Over a year ago

got it to work by changing if ser.empty: into if len(ser)==0: Thanks!

Collectives™ on Stack Overflow

Aggregating string field into a list with python pandas

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related