manipulating multiindex columns in Pandas

Question

I have a simple multiindex dataframe in pandas. I'm trying to add additional subcolumns, but I'm being warned off with

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I cannot manage to get the right indexing incantation to make this work.

Attaching a code fragment that has a simple, non-hierarchical example of the sorts of columns I want to add. Then i have a hierchical example where I demonstrate how i can add new top-level colummns, but cannot properly manipulate individual sub-columns

import pandas as pd
import numpy as np


#simple example that works: add two columns to a non-hierarchical frame
sdf = pd.DataFrame(np.random.randn(6,4),columns=list('ABCD'))
sdf['E'] = 7
sdf['F'] = sdf['A'].diff(-1)


#hierarchical example
df = pd.DataFrame({('co1', 'price'): {0: 1, 1: 2, 2:12, 3: 14, 4: 15},\
('co1', 'size'): {0: 1, 1: 5, 2: 9, 3: 13, 4: 17},\
('co2', 'price'): {0: 2, 1: 6, 2: 10, 3: 14, 4: 18},\
('co2', 'size'): {0: 3, 1: 7, 2: 11, 3: 15, 4: 19}})

df.index.names = ['run']
df.columns.names = ['security', 'characteristic']

#I can add a new top level column
df['newTopLevel?'] = "yes"


#I cannot manipulate values of existing sub-level columns
"""A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead"""

df['co1']['size'] = "gross"
df['co1']['price'] = df['co1']['price']*2


#I cannot add a new sub-level column
df['co1']['new_sub_col'] = "fails"

I seem to be missing some fundamental understanding of this issue, which is frustrating as I've read pretty closely the O'Reilly "Python for Data Analysis" book written by the pandas author! ugh.

Andy Hayden · Accepted Answer · 2015-10-10 07:44:17Z

1

To avoid the warning/error use loc and do these in one assignment:

In [11]: df.loc[:, ('co1', 'size')] = "gross"

In [12]: df.loc[:, ('co1', 'price')] *= 2

In [13]: df.loc[:, ('co1', 'new_sub_col')] = "fails"  # not anymore

In [14]: df
Out[14]:
security         co1          co2      newTopLevel?         co1
characteristic price   size price size              new_sub_col
run
0                  2  gross     2    3          yes       fails
1                  4  gross     6    7          yes       fails
2                 24  gross    10   11          yes       fails
3                 28  gross    14   15          yes       fails
4                 30  gross    18   19          yes       fails

answered Oct 10, 2015 at 7:44

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user3556757 Over a year ago

Subsequent to filing this question, I discovered that I could write: df['co1', 'price'] = "gross". Is there any reason to prefer this or the answer you suggested?

Collectives™ on Stack Overflow

manipulating multiindex columns in Pandas

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related