Iterations over dataframe groupby

Question

       A   B  C  
0    Bob  10  2
1    Bob  11  8
2  Sarah  23 -2
3  Sarah  24  4
4   Jack  19 -4
5   Jack  21 -1

I want to get a new df["Point"] as follows:

To "Bob" group: df["Point"] is the multiplication of first B value by C values. 10*2=20; 10*8=80.
To "Sarah" group: df["Point"] is the multiplication of first B value by C values. 23*(-2)=(-46); 23*4=92.
To "Jack" group: df["Point"] is the multiplication of first B value by C values. 19*(-4)=(-76); 19*(-1)=(-19).

I mean, I want to get:

       A   B  C  Point
0    Bob  10  2     20
1    Bob  11  8     80
2  Sarah  23 -2    -46
3  Sarah  24  4     92
4   Jack  19 -4    -76
5   Jack  21 -1    -19

After that, I want to do the following iteration:

results = {}

grouped = df.groupby("A")

for idx, group in grouped:
    if (group["Point"] > 50).any():
        results[idx] = group[group["Point"] > 50].head(1)
        print ("")
    else:
        results[idx] = group.tail(1)
        print ("")
    print(results[idx])

And get this results:

      A   B  C  Point
1   Bob  11  8     80

      A   B  C  Point
3 Sarah  23  4     92

      A   B  C  Point
5  Jack  21 -1    -19

I guess I have to do a double iteration but I don´t know how, or if it possible to do that in a different way.

Ok. I´ve edited it. I think it would be right for me with the first Sarah value. — Tie_24
– Tie_24, Commented Mar 3, 2018 at 11:46

jezrael · Accepted Answer · 2018-03-03 12:05:18Z

For first create new column by transform with first and multiple by C column:

df['point'] = df.groupby('A')['B'].transform('first').mul(df['C'])
print (df)
       A   B  C  point
0    Bob  10  2     20
1    Bob  11  8     80
2  Sarah  23 -2    -46
3  Sarah  24  4     92
4   Jack  19 -4    -76
5   Jack  21 -1    -19

And then filter first all rows by condition and get only first rows by drop_duplicates - keep='first' is by default:

df1 = df[df['point'] > 50].drop_duplicates('A')
print (df1)
       A   B  C  point
1    Bob  11  8     80
3  Sarah  24  4     92

Then filter rows which are not in df1.A column by isin and inverted condition by ~, again drop_duplicates with keep last rows only:

df2 = df[~df['A'].isin(df1['A'])].drop_duplicates('A', keep='last')
print (df2)
      A   B  C  point
5  Jack  21 -1    -19

Last use concat with dict comprehension for final dictionary:

d = {k: v for k, v in pd.concat([df1, df2]).groupby('A')}
print (d)
{'Bob':      A   B  C  point
1  Bob  11  8     80, 'Jack':       A   B  C  point
5  Jack  21 -1    -19, 'Sarah':        A   B  C  point
3  Sarah  24  4     92}

Collectives™ on Stack Overflow

Iterations over dataframe groupby

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related