0

I am trying to run apply in pandas.DataFrame so that a function would run through the whole table, taking a few column fields as input, and generate multiple new fields at the same time, and once the scan is done, the new fields could form extra multiple new columns.

Conceptually the following describes what I need: to apply a function f to the DataFrame column-wise to generate multiple new columns at the same time:

f :: field1, field2, field3, ... -> newfield1, newfield2,...

when I apply this function to the DataFrame, it gives me

f' :: column1, column2, column3, ... -> newcolumn1, newcolumn2, ...

Here is an example:

>>> df
   denominator  numerator
0            3         10
1            5         12
2            7         14

I would like to create two more columns, quotient and remainder.

In this particular example I could run // and % separately because it is trivial but it is not the preferred because I can technically get both quotient and remainder at the same time. In some real world cases, getting them at the same time is more efficient.

The following is what I came up with but I don't know if it is the most pythonic way of doing it. How df.apply turns a sequence of row-based pd.Series into columns is also not clear to me.

>>> def rundivmod(n, d):
...   q, r = divmod(n, d)
...   return {'quotient': q, 'remainder': r}
>>> pd.merge(df, df.apply(lambda row: pd.Series(rundivmod(row.numerator, row.denominator)), axis=1), left_index=True, right_index=True)
   denominator  numerator  quotient  remainder
0            3         10         3          1
1            5         12         2          2
2            7         14         2          0

EDIT: removed my other method to generate quotient and remainder separately as they are misleading in this case.

1
  • 3
    df['quotient'], df['remainder'] = df['numerator']//df['denominator'], df['numerator'] % df['denominator']? Commented Mar 12, 2020 at 16:03

2 Answers 2

3

Function:

def rundivmod(n, d):
    return divmod(n, d)

Code:

out = df.apply(lambda x: rundivmod(x['numerator'], x['denominator']) ,1).apply(pd.Series)
out.columns = ['quotient', 'remainder']
df = pd.concat([df, out], 1)

Output:

    denominator numerator   quotient    remainder
0   3             10          3          1
1   5             12          2          2
2   7             14          2          0
Sign up to request clarification or add additional context in comments.

3 Comments

I could do this but that's not the point. I will delete my option 1)
the point is to run a function that takes n columns and returns m-tuple and add the m-tuple to the data frame as m columns.
If the motive for getting both the quotient and remainder at the same time is performance, this answer should be much faster than using any kind of .apply since it uses vectorized operations.
1

In general you should avoid apply if possible, many operations can be done without iterating over the rows. But if for some reason you must, you can create a function that returns a Series after acting on the rows and then concat that back.

import pandas as pd
df = pd.DataFrame({'data': [2,3,4,5]})

Raises 'data' to multiple powers1

def apply_pow(row, N):
    return pd.Series(row['data']**np.array(range(N)),
                     index=[f'power_{i}' for i in range(N)],  # become col names
                     )

pd.concat([df, df.apply(apply_pow, N=3, axis=1)], axis=1)
#   data  power_0  power_1  power_2
#0     2        1        2        4
#1     3        1        3        9
#2     4        1        4       16
#3     5        1        5       25

1should be vectorized using np.vander(df['data'], N=3, increasing=True)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.