Write Pandas DataFrame with List in Column to a File

Question

I have a simple dataframe that has emails being sent to different receivers:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Sender': ['Alice', 'Alice', 'Bob', 'Carl', 'Bob', 'Alice'],
                   'Receiver': ['David', 'Eric', 'Frank', 'Ginger', 'Holly', 'Ingrid'],
                   'Emails': [9, 3, 5, 1, 6, 7]
                  })
df

That looks like this:

    Emails  Receiver    Sender
0   9       David       Alice
1   3       Eric        Alice
2   5       Frank       Bob
3   1       Ginger      Carl
4   6       Holly       Bob
5   7       Ingrid      Alice

For each sender, I can get a list of receivers by performing a groupby along with a custom aggregation:

grouped = df.groupby('Sender')
grouped.agg({'Receiver': (lambda x: list(x)),
                   'Emails': np.sum
                  })

Which produces this dataframe output:

        Emails  Receiver
Sender      
Alice   19      [David, Eric, Ingrid]
Bob     11      [Frank, Holly]
Carl    1       [Ginger]

I want to write the dataframe to a file (not a CSV since it will be jagged) with spaces separating each element (including splitting out the list) so it would look like this:

Alice 19 David Eric Ingrid
Bob 11 Frank Holly
Carl 1 Ginger

I'd could iterate over each row and write the contents to a file but I was wondering if there was a better approach to get the same output starting from the original dataframe?

why don't you do grouped.agg({'Reciver': ' '.join, 'Emails':np.sum}).to_csv(sep=' ')? — Quang Hoang
– Quang Hoang, Commented Oct 7, 2019 at 13:41
Then, you can do 'Receiver': lambda x: ' '.join(x.astype(str) or something similar? — Quang Hoang
– Quang Hoang, Commented Oct 7, 2019 at 13:54

Anwarvic · Accepted Answer · 2019-10-07 13:46:16Z

1

You can do that using like so:

output_file = './out.txt'
with open(output_file, 'w') as fout:
    for group, df in grouped:
        fout.write('{} {} {}\n'.format(group,
                                       sum(df['Emails'].values),
                                       ' '.join(df['Receiver'].values)))

Now, the out.txt file will be:

Alice 19 David Eric Ingrid
Bob 11 Frank Holly
Carl 1 Ginger

edited Oct 7, 2019 at 13:46

answered Oct 7, 2019 at 13:42

Anwarvic

13.2k5 gold badges60 silver badges72 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

slaw Over a year ago

This is not the output that I am looking for. Please see original question.

Dani Mesejo · Accepted Answer · 2019-10-07 13:53:26Z

0

You are almost there, just use ' '.join as the aggregating function for the Receiver column:

import numpy as np
import pandas as pd

df = pd.DataFrame({'Sender': ['Alice', 'Alice', 'Bob', 'Carl', 'Bob', 'Alice'],
                   'Receiver': ['David', 'Eric', 'Frank', 'Ginger', 'Holly', 'Ingrid'],
                   'Emails': [9, 3, 5, 1, 6, 7]
                   })

grouped = df.groupby('Sender')
result = grouped.agg({'Receiver': ' '.join,
             'Emails': np.sum
             })

print(result)

Output

                 Receiver  Emails
Sender                           
Alice   David Eric Ingrid      19
Bob           Frank Holly      11
Carl               Ginger       1

For the sake of completeness, if the Receiver column where int instead of strings you could transform to string first and then join:

df = pd.DataFrame({'Sender': ['Alice', 'Alice', 'Bob', 'Carl', 'Bob', 'Alice'],
                   'Receiver': [1, 2, 3, 4, 5, 6],
                   'Emails': [9, 3, 5, 1, 6, 7]
                   })

grouped = df.groupby('Sender')
result = grouped.agg({'Receiver': lambda x: ' '.join(map(str, x)),
                      'Emails': np.sum
                      })

print(result)

Output

       Receiver  Emails
Sender                 
Alice     1 2 6      19
Bob         3 5      11
Carl          4       1

edited Oct 7, 2019 at 13:53

answered Oct 7, 2019 at 13:44

Dani Mesejo

62.2k6 gold badges57 silver badges86 bronze badges

1 Comment

slaw Over a year ago

For my own understanding, what would the aggregating function be if, instead of names in the receiver column, they were integers/floats? join would error out with expected str instance, int found. Knowing this would be super helpful! Thanks in advance

Collectives™ on Stack Overflow

Write Pandas DataFrame with List in Column to a File

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related