0

I have a simple dataframe that has emails being sent to different receivers:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Sender': ['Alice', 'Alice', 'Bob', 'Carl', 'Bob', 'Alice'],
                   'Receiver': ['David', 'Eric', 'Frank', 'Ginger', 'Holly', 'Ingrid'],
                   'Emails': [9, 3, 5, 1, 6, 7]
                  })
df

That looks like this:

    Emails  Receiver    Sender
0   9       David       Alice
1   3       Eric        Alice
2   5       Frank       Bob
3   1       Ginger      Carl
4   6       Holly       Bob
5   7       Ingrid      Alice

For each sender, I can get a list of receivers by performing a groupby along with a custom aggregation:

grouped = df.groupby('Sender')
grouped.agg({'Receiver': (lambda x: list(x)),
                   'Emails': np.sum
                  })

Which produces this dataframe output:

        Emails  Receiver
Sender      
Alice   19      [David, Eric, Ingrid]
Bob     11      [Frank, Holly]
Carl    1       [Ginger]

I want to write the dataframe to a file (not a CSV since it will be jagged) with spaces separating each element (including splitting out the list) so it would look like this:

Alice 19 David Eric Ingrid
Bob 11 Frank Holly
Carl 1 Ginger

I'd could iterate over each row and write the contents to a file but I was wondering if there was a better approach to get the same output starting from the original dataframe?

2
  • why don't you do grouped.agg({'Reciver': ' '.join, 'Emails':np.sum}).to_csv(sep=' ')? Commented Oct 7, 2019 at 13:41
  • Then, you can do 'Receiver': lambda x: ' '.join(x.astype(str) or something similar? Commented Oct 7, 2019 at 13:54

2 Answers 2

1

You can do that using like so:

output_file = './out.txt'
with open(output_file, 'w') as fout:
    for group, df in grouped:
        fout.write('{} {} {}\n'.format(group,
                                       sum(df['Emails'].values),
                                       ' '.join(df['Receiver'].values)))

Now, the out.txt file will be:

Alice 19 David Eric Ingrid
Bob 11 Frank Holly
Carl 1 Ginger
Sign up to request clarification or add additional context in comments.

1 Comment

This is not the output that I am looking for. Please see original question.
0

You are almost there, just use ' '.join as the aggregating function for the Receiver column:

import numpy as np
import pandas as pd

df = pd.DataFrame({'Sender': ['Alice', 'Alice', 'Bob', 'Carl', 'Bob', 'Alice'],
                   'Receiver': ['David', 'Eric', 'Frank', 'Ginger', 'Holly', 'Ingrid'],
                   'Emails': [9, 3, 5, 1, 6, 7]
                   })

grouped = df.groupby('Sender')
result = grouped.agg({'Receiver': ' '.join,
             'Emails': np.sum
             })

print(result)

Output

                 Receiver  Emails
Sender                           
Alice   David Eric Ingrid      19
Bob           Frank Holly      11
Carl               Ginger       1

For the sake of completeness, if the Receiver column where int instead of strings you could transform to string first and then join:

df = pd.DataFrame({'Sender': ['Alice', 'Alice', 'Bob', 'Carl', 'Bob', 'Alice'],
                   'Receiver': [1, 2, 3, 4, 5, 6],
                   'Emails': [9, 3, 5, 1, 6, 7]
                   })

grouped = df.groupby('Sender')
result = grouped.agg({'Receiver': lambda x: ' '.join(map(str, x)),
                      'Emails': np.sum
                      })

print(result)

Output

       Receiver  Emails
Sender                 
Alice     1 2 6      19
Bob         3 5      11
Carl          4       1

1 Comment

For my own understanding, what would the aggregating function be if, instead of names in the receiver column, they were integers/floats? join would error out with expected str instance, int found. Knowing this would be super helpful! Thanks in advance

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.