0

I have the following data frame in pandas:

>>> df1[1:15]
      gene      beta
1    PALMD       NaN
2    PALMD       NaN
3    FRRS1  1.966503
4      AGL       NaN
5      AGL -4.082453
6      AGL  2.840288
7      AGL       NaN
8      AGL -4.909043
9      AGL       NaN
10     AGL  3.275433
11   SASS6       NaN
12   SASS6 -3.239315
13  TRMT13  3.434759
14  TRMT13  4.282222

I would like to create a variable which will indicate if all of the beta values for each gene are are(1) all positive betas for that gene, (2) all negative betas, or (3) mixed. I will discard NaN unless they are the only type for a given gene. This is the goal:

>>> df1[1:15]
      gene   Direction
1    PALMD         NaN
2    FRRS1         Pos
3      AGL         Mix
4    SASS6         Neg
5   TRMT13         Pos

I tried to aggregate by gene but I got an error, possibly due to the NaN. If possible I would like to keep the output as a pandas data frame since I will have to merge this to another df in the future

>>> df1g = df1.groupby("gene")
>>> df1ga = df1g.agg(np.concatenate)
KeyError: 0L

Thank you

1 Answer 1

2

I'd write a little label function:

def label(ser):
    ser = ser.dropna()
    if ser.empty:
        return np.nan
    if (ser >= 0).all():
        return "Pos"
    if (ser < 0).all():
        return "Neg"
    return "Mix"

and then pass it to groupby.agg to make it easy to speciy the name:

>>> labelled = df.groupby("gene")["beta"].agg({"Direction": label}).reset_index()
>>> labelled
     gene Direction
0     AGL       Mix
1   FRRS1       Pos
2   PALMD       NaN
3   SASS6       Neg
4  TRMT13       Pos
Sign up to request clarification or add additional context in comments.

4 Comments

dsm great answer. Whats the purpose of the dictionary? Is it to get the label on the generated column?
Yep. df.groupby("gene")["beta"].agg(label).reset_index(name="Direction") would have worked too, but that always feels a little magical to me.
I am seeing this error when I implement your code: AttributeError: 'Series' object has no attribute 'empty'. Could this be a version thing?
got it to work by changing if ser.empty: into if len(ser)==0: Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.