.apply() function to dataframe and return new dataframe?

Question

What is the best option to create new DataFrame from a function applied to each row of a data frame. The ultimate goal is to concat (rbind) all the resulting new_dataframes.

Input:

   Name  Age
0   tom   10
1  nick   15
2  juli   14

Example:

import pandas as pd
import pdb

data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns=['Name', 'Age'])

def foo(row):
 #pdb.set_trace()
 new_df = row.to_frame(name='Values')
 new_df.loc[new_df.index=='Name','New_column'] = 'Surname'
 new_df.loc[new_df.index=='Age','New_column'] = '+5 months'
 return new_df

df.apply(foo, axis=1)

Output:

data = {'Values':['Tom', '10', 'nich', '15', 'juli', '14'], 
'New_column': ['Surname', '+5 months', 'Surname', '+5 months', 'Surname', 
'+5 months']}
output = pd.DataFrame(data)

 Values New_column
0    Tom    Surname
1     10  +5 months
2   nich    Surname
3     15  +5 months
4   juli    Surname
5     14  +5 months

If .apply() is not the best option, I would appreciate an alternative.

For R users, I am looking for do.call(rbind, sapply())

Thanks.

I put the Input and final Output on the question. Hope now makes it easier. — AlexSB
– AlexSB, Commented Nov 1, 2019 at 10:53

Valdi_Bo · Accepted Answer · 2019-11-01 10:50:41Z

2

Start from one improvement in your function:

def foo(row):
    new_df = row.to_frame(name='Values')
    new_df.loc['Name', 'New_column'] = 'Surname'
    new_df.loc['Age', 'New_column'] = '+5 months'
    return new_df

("new_df.index==" is not needed).

To get your output, convert the Series of DataFrames (resulting from apply) into an ordinaty list (of DataFrames) and concatenate them.

The code to do it is:

pd.concat(df.apply(foo, axis=1).tolist())

answered Nov 1, 2019 at 10:50

Valdi_Bo

31.1k4 gold badges29 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Erfan · Accepted Answer · 2019-11-01 13:09:42Z

1

Without using apply which is pretty slow, we can use pandas and numpy methods here: transform, melt and numpy.tile:

df = df.T.melt().drop(columns='variable')
df['New_column'] = np.tile(['Surname', '5+ months'], len(df)//2)

  value New_column
0   tom    Surname
1    10  5+ months
2  nick    Surname
3    15  5+ months
4  juli    Surname
5    14  5+ months

answered Nov 1, 2019 at 13:09

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Comments

Andrea Grioni · Accepted Answer · 2019-11-01 11:57:48Z

Here a different approach that is using built-in functions of pandas and numpy.

import pandas as pd
import numpy as np
import pdb

# create df
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns=['Name', 'Age'])

# provide unique ids for each row
df['id']=df.index
# Unpivot DataFrame using unique id as reference
n = df.melt(id_vars=['id'], value_vars=['Name', 'Age'])
# add 'new_column' and updates its values with np.where
n['new_column'] = np.where(n['variable'] == 'Name', 'Surname', '+5 months')
# sort df to pair name and age
n.sort_values('id', inplace=True)
# assign row names
n.index = n['variable']
# drop unnecessary columns
n.drop(['id', 'variable'], axis = 1)

output:

           value    new_column
variable        
Name       tom      Surname
Age        10       +5 months
Name       nick     Surname
Age        15       +5 months
Name       juli     Surname
Age        14       +5 months

Mark Rotteveel · Accepted Answer · 2019-11-01 12:18:52Z

0

Perhaps try:

df = df.apply(foo, axis=1)

edited Nov 1, 2019 at 12:18

Mark Rotteveel

110k240 gold badges160 silver badges232 bronze badges

answered Nov 1, 2019 at 10:41

Henry James

1453 silver badges14 bronze badges

Collectives™ on Stack Overflow

.apply() function to dataframe and return new dataframe?

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related