Pandas: Assigning multiple new columns simultaneously

Question

I have a DataFrame df with a column containing labels for each row (in addition to some relevant data for each row). I have a dictionary labeldict with keys equal to the possible labels and values equal to 2-tuples of information related to that label. I'd like to tack two new columns onto my frame, one for each part of the 2-tuple corresponding to the label for each row.

Here is the setup:

import pandas as pd
import numpy as np

np.random.seed(1)
n = 10

labels = list('abcdef')
colors = ['red', 'green', 'blue']
sizes = ['small', 'medium', 'large']

labeldict = {c: (np.random.choice(colors), np.random.choice(sizes)) for c in labels}

df = pd.DataFrame({'label': np.random.choice(labels, n), 
                   'somedata': np.random.randn(n)})

I can get what I want by running:

df['color'], df['size'] = zip(*df['label'].map(labeldict))
print df

  label  somedata  color    size
0     b  0.196643    red  medium
1     c -1.545214  green   small
2     a -0.088104  green   small
3     c  0.852239  green   small
4     b  0.677234    red  medium
5     c -0.106878  green   small
6     a  0.725274  green   small
7     d  0.934889    red  medium
8     a  1.118297  green   small
9     c  0.055613  green   small

But how can I do this if I don't want to manually type out the two columns on the left side of the assignment? I.e. how can I create multiple new columns on the fly. For example, if I had 10-tuples in labeldict instead of 2-tuples, this would be a real pain as currently written. Here are a couple things that don't work:

# set up attrlist for later use
attrlist = ['color', 'size']

# non-working idea 1)
df[attrlist] = zip(*df['label'].map(labeldict))

# non-working idea 2)
df.loc[:, attrlist] = zip(*df['label'].map(labeldict))

This does work, but seems like a hack:

for a in attrlist:
    df[a] = 0
df[attrlist] = zip(*df['label'].map(labeldict))

Better solutions?

FYI your last method will work in 0.13 (without initially creating the column) — Jeff
– Jeff, Commented Dec 29, 2013 at 21:55
Thanks Jeff. I have a similar question inspired by your comment here. I am having a similar problem to the one stated here — Amelio Vazquez-Reina
– Amelio Vazquez-Reina, Commented Sep 17, 2014 at 16:58
Seems like the answer is pythonically simple: stackoverflow.com/a/34074894/3494126 — Ufos
– Ufos, Commented Oct 8, 2019 at 9:57

Markus Dutschke · Accepted Answer · 2019-07-23 17:11:28Z

35

Just use result_type='expand' in pandas apply

df
Out[78]: 
   a  b
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9

df[['mean', 'std', 'max']]=df[['a','b']].apply(mathOperationsTuple, axis=1, result_type='expand')

df
Out[80]: 
   a  b  mean  std  max
0  0  1   0.5  0.5  1.0
1  2  3   2.5  0.5  3.0
2  4  5   4.5  0.5  5.0
3  6  7   6.5  0.5  7.0
4  8  9   8.5  0.5  9.0

and here some copy paste code

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(10).reshape(5,2), columns=['a','b'])
print('df',df, sep='\n')
print()
def mathOperationsTuple(arr):
    return np.mean(arr), np.std(arr), np.amax(arr)

df[['mean', 'std', 'max']]=df[['a','b']].apply(mathOperationsTuple, axis=1, result_type='expand')
print('df',df, sep='\n')

answered Jul 23, 2019 at 17:11

Markus Dutschke

10.8k5 gold badges73 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

deaks Over a year ago

Use of apply was what I wanted but although this answer was helpful it made the assignment and function interdependent based on the order of the columns used as input and output. I ended up using the approach from @EricNess instead to capture this mapping entirely within the helper function.

Markus Dutschke Over a year ago

Better use this anser: stackoverflow.com/a/76098907/7128154

alko · Accepted Answer · 2013-12-29 21:02:11Z

13

You can use merge instead:

>>> ld = pd.DataFrame(labeldict).T
>>> ld.columns = ['color', 'size']
>>> ld.index.name = 'label'
>>> df.merge(ld.reset_index(), on='label')
  label  somedata  color    size
0     b  1.462108    red  medium
1     c -2.060141  green   small
2     c  1.133769  green   small
3     c  0.042214  green   small
4     e -0.322417    red  medium
5     e -1.099891    red  medium
6     e -0.877858    red  medium
7     e  0.582815    red  medium
8     f -0.384054    red   large
9     d -0.172428    red  medium

answered Dec 29, 2013 at 21:02

alko

48.7k12 gold badges99 silver badges105 bronze badges

2 Comments

Andy Hayden Over a year ago

you can get away without the reset_index using merge's left_index and right_on.

Louis Yang Over a year ago

Note that merge generates a new dataframe instead of adding columns to an existing one. Also if there is no index column in the dataframe of your problem, you can't use merge.

BrenBarn · Accepted Answer · 2013-12-29 21:00:44Z

8

Instead of doing what you're doing with labeldict, you could make that information into a DataFrame and then join it with your original one:

>>> labeldf = pandas.DataFrame([(np.random.choice(colors), np.random.choice(sizes)) for c in labels], columns=['color', 'size'], index=labels)
>>> df.join(labeldf, on='label')
  label  somedata  color    size
0     a -1.709973    red  medium
1     b  0.099109   blue  medium
2     a -0.427323    red  medium
3     b  0.474995   blue  medium
4     b -2.819208   blue  medium
5     d -0.998888    red   small
6     b  0.713357   blue  medium
7     d  0.331989    red   small
8     e -0.906240  green   large
9     c -0.501916   blue   large

answered Dec 29, 2013 at 21:00

BrenBarn

253k39 gold badges421 silver badges392 bronze badges

Comments

Eric Ness · Accepted Answer · 2019-03-15 20:22:17Z

3

If you want to add multiple columns to a DataFrame as part of a method chain, you can use apply. The first step is to create a function that will transform a row represented as a Series into the form you want. Then you can call apply to use this function on each row.

def append_label_attributes(row: pd.Series, labelmap: dict) -> pd.Series:
    result = row.copy()
    result['color'] = labelmap[result['label']][0]
    result['size'] = labelmap[result['label']][1]
    return result

df = (
    pd.DataFrame(
    {
        'label': np.random.choice(labels, n),
        'somedata': np.random.randn(n)}
    )
    .apply(append_label_attributes, axis='columns', labelmap=labeldict)
)

answered Mar 15, 2019 at 20:22

Eric Ness

10.4k15 gold badges52 silver badges51 bronze badges

Comments

AfterFray · Accepted Answer · 2021-11-18 11:29:01Z

2

This should work:

df[['color','size']] = list(df['label'].apply(labeldict))

edited Nov 18, 2021 at 11:29

AfterFray

1,8814 gold badges19 silver badges27 bronze badges

answered Nov 17, 2021 at 18:38

Margalit

591 bronze badge

Comments

Markus Dutschke · Accepted Answer · 2023-04-25 08:06:12Z

`pandas` `apply` method support this out of the box

code

demo.py

import pandas as pd
import numpy as np

df = pd.DataFrame(np.arange(10).reshape(5, 2), columns=['a', 'b'])
df2 = df[['a', 'b']].apply(lambda ser: pd.Series({
    'mean': ser.mean(),
    'std': ser.std(ddof=0),
    'max': np.amax(ser.to_numpy())
}), axis=1)
df2 = pd.concat((df, df2), axis=1)

output

>>> from demo import df, df2
>>> df
   a  b
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9
>>> df2
   a  b  mean  std  max
0  0  1   0.5  0.5  1.0
1  2  3   2.5  0.5  3.0
2  4  5   4.5  0.5  5.0
3  6  7   6.5  0.5  7.0
4  8  9   8.5  0.5  9.0

background

A few years after writing my first answer to this question I got a bit better overview of the mechanics in pandas.

The func argument in apply takes a function: pd.Series -> Any whereas from the type of Any the exact processing of the returned value into the DataFrame is deduced.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function.

If you pass a lambda function: pd.Series -> pd.Series, you get exactly the required behavior.

Note

apply can also be applied on pd.groupby outputs, achieving a more flexible alternative to .aggregate. You can find an example in the question: Python pandas groupby aggregate on multiple columns, then pivot

Collectives™ on Stack Overflow

Pandas: Assigning multiple new columns simultaneously

6 Answers 6

2 Comments

2 Comments

Comments

Comments

Comments

`pandas` `apply` method support this out of the box

code

output

background

Note

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

2 Comments

2 Comments

Comments

Comments

Comments

pandas apply method support this out of the box

code

output

background

Note

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

`pandas` `apply` method support this out of the box