3
       A   B  C  
0    Bob  10  2
1    Bob  11  8
2  Sarah  23 -2
3  Sarah  24  4
4   Jack  19 -4
5   Jack  21 -1

I want to get a new df["Point"] as follows:

  • To "Bob" group: df["Point"] is the multiplication of first B value by C values. 10*2=20; 10*8=80.
  • To "Sarah" group: df["Point"] is the multiplication of first B value by C values. 23*(-2)=(-46); 23*4=92.
  • To "Jack" group: df["Point"] is the multiplication of first B value by C values. 19*(-4)=(-76); 19*(-1)=(-19).

I mean, I want to get:

       A   B  C  Point
0    Bob  10  2     20
1    Bob  11  8     80
2  Sarah  23 -2    -46
3  Sarah  24  4     92
4   Jack  19 -4    -76
5   Jack  21 -1    -19

After that, I want to do the following iteration:

results = {}

grouped = df.groupby("A")

for idx, group in grouped:
    if (group["Point"] > 50).any():
        results[idx] = group[group["Point"] > 50].head(1)
        print ("")
    else:
        results[idx] = group.tail(1)
        print ("")
    print(results[idx])

And get this results:

      A   B  C  Point
1   Bob  11  8     80

      A   B  C  Point
3 Sarah  23  4     92

      A   B  C  Point
5  Jack  21 -1    -19

I guess I have to do a double iteration but I don´t know how, or if it possible to do that in a different way.

2
  • Why is second Sarah group multiple by second value? Commented Mar 3, 2018 at 11:35
  • Ok. I´ve edited it. I think it would be right for me with the first Sarah value. Commented Mar 3, 2018 at 11:46

1 Answer 1

3

For first create new column by transform with first and multiple by C column:

df['point'] = df.groupby('A')['B'].transform('first').mul(df['C'])
print (df)
       A   B  C  point
0    Bob  10  2     20
1    Bob  11  8     80
2  Sarah  23 -2    -46
3  Sarah  24  4     92
4   Jack  19 -4    -76
5   Jack  21 -1    -19

And then filter first all rows by condition and get only first rows by drop_duplicates - keep='first' is by default:

df1 = df[df['point'] > 50].drop_duplicates('A')
print (df1)
       A   B  C  point
1    Bob  11  8     80
3  Sarah  24  4     92

Then filter rows which are not in df1.A column by isin and inverted condition by ~, again drop_duplicates with keep last rows only:

df2 = df[~df['A'].isin(df1['A'])].drop_duplicates('A', keep='last')
print (df2)
      A   B  C  point
5  Jack  21 -1    -19

Last use concat with dict comprehension for final dictionary:

d = {k: v for k, v in pd.concat([df1, df2]).groupby('A')}
print (d)
{'Bob':      A   B  C  point
1  Bob  11  8     80, 'Jack':       A   B  C  point
5  Jack  21 -1    -19, 'Sarah':        A   B  C  point
3  Sarah  24  4     92}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.