How to generate multiple new columns through applying function by row in pandas.DataFrame?

Question

I am trying to run apply in pandas.DataFrame so that a function would run through the whole table, taking a few column fields as input, and generate multiple new fields at the same time, and once the scan is done, the new fields could form extra multiple new columns.

Conceptually the following describes what I need: to apply a function f to the DataFrame column-wise to generate multiple new columns at the same time:

f :: field1, field2, field3, ... -> newfield1, newfield2,...

when I apply this function to the DataFrame, it gives me

f' :: column1, column2, column3, ... -> newcolumn1, newcolumn2, ...

Here is an example:

>>> df
   denominator  numerator
0            3         10
1            5         12
2            7         14

I would like to create two more columns, quotient and remainder.

In this particular example I could run // and % separately because it is trivial but it is not the preferred because I can technically get both quotient and remainder at the same time. In some real world cases, getting them at the same time is more efficient.

The following is what I came up with but I don't know if it is the most pythonic way of doing it. How df.apply turns a sequence of row-based pd.Series into columns is also not clear to me.

>>> def rundivmod(n, d):
...   q, r = divmod(n, d)
...   return {'quotient': q, 'remainder': r}
>>> pd.merge(df, df.apply(lambda row: pd.Series(rundivmod(row.numerator, row.denominator)), axis=1), left_index=True, right_index=True)
   denominator  numerator  quotient  remainder
0            3         10         3          1
1            5         12         2          2
2            7         14         2          0

EDIT: removed my other method to generate quotient and remainder separately as they are misleading in this case.

df['quotient'], df['remainder'] = df['numerator']//df['denominator'], df['numerator'] % df['denominator']? — Quang Hoang
– Quang Hoang, Commented Mar 12, 2020 at 16:03

harpan · Accepted Answer · 2020-03-12 16:23:22Z

3

Function:

def rundivmod(n, d):
    return divmod(n, d)

Code:

out = df.apply(lambda x: rundivmod(x['numerator'], x['denominator']) ,1).apply(pd.Series)
out.columns = ['quotient', 'remainder']
df = pd.concat([df, out], 1)

Output:

    denominator numerator   quotient    remainder
0   3             10          3          1
1   5             12          2          2
2   7             14          2          0

edited Mar 12, 2020 at 16:23

answered Mar 12, 2020 at 16:07

harpan

8,6412 gold badges22 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

dhu Over a year ago

I could do this but that's not the point. I will delete my option 1)

dhu Over a year ago

the point is to run a function that takes n columns and returns m-tuple and add the m-tuple to the data frame as m columns.

kennyvh Over a year ago

If the motive for getting both the quotient and remainder at the same time is performance, this answer should be much faster than using any kind of .apply since it uses vectorized operations.

ALollz · Accepted Answer · 2020-03-12 16:26:17Z

In general you should avoid apply if possible, many operations can be done without iterating over the rows. But if for some reason you must, you can create a function that returns a Series after acting on the rows and then concat that back.

import pandas as pd
df = pd.DataFrame({'data': [2,3,4,5]})

Raises 'data' to multiple powers¹

def apply_pow(row, N):
    return pd.Series(row['data']**np.array(range(N)),
                     index=[f'power_{i}' for i in range(N)],  # become col names
                     )

pd.concat([df, df.apply(apply_pow, N=3, axis=1)], axis=1)
#   data  power_0  power_1  power_2
#0     2        1        2        4
#1     3        1        3        9
#2     4        1        4       16
#3     5        1        5       25

¹should be vectorized using np.vander(df['data'], N=3, increasing=True)

Collectives™ on Stack Overflow

How to generate multiple new columns through applying function by row in pandas.DataFrame?

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related