0

I already asked a related question earlier, but I didn't want to start a comment-and-edit-discussion. So here's -boiled down - what the answer to my earlier question lead me to ask. Consider

import pandas as pd
from numpy import arange
from scipy import random

index = pd.MultiIndex.from_product([arange(0,3), arange(10,15)], names=['A', 'B'])
df = pd.DataFrame(columns=['test'], index=index)
someValues = random.randint(0, 10, size=5)

df.loc[0, 'test'], df.loc[0,:] and df.ix[0] all create a representation of a part of the data frame, the first one being a Series and the other two being df slices. However

  • df.ix[0] = df.loc[0,'test'] = someValues sets the value for the df
  • df.loc[0,'test'] = someValues gives an error ValueError: total size of new array must be unchanged
  • df.loc[0,:] = someValues is being ignored. No error, but the df does not contain the numpy array.

I skimmed the docs but there was no clear logical and systematical explanation on what is going on with MultiIndexes in general. So far, I guess that "if the view is a Series, you can set values", and "otherwise, god knows what happens".

Could someone shed some light on the logic? Moreover, is there some deep meaning behind this or are these just constraints due to how it is set up?

1 Answer 1

2

These are all with 0.13.1

These are not all 'slice' representations at all.

This is a Series.

In [50]: df.loc[0,'test']
Out[50]: 
B
10    NaN
11    NaN
12    NaN
13    NaN
14    NaN
Name: test, dtype: object

These are DataFrames (and the same)

In [51]: df.loc[0,:]
Out[51]: 
   test
B      
10  NaN
11  NaN
12  NaN
13  NaN
14  NaN

[5 rows x 1 columns]

In [52]: df.ix[0]
Out[52]: 
   test
B      
10  NaN
11  NaN
12  NaN
13  NaN
14  NaN

[5 rows x 1 columns]

This is trying to assign the wrong shape (it looks like it should work, but if you have multiple columns then it won't, that is why this is not allowed)

In [54]: df.ix[0] = someValues
ValueError: could not broadcast input array from shape (5) into shape (5,1)

This works because it knows how to broadcast

In [56]: df.loc[0,:] = someValues

In [57]: df
Out[57]: 
     test
A B      
0 10    4
  11    3
  12    4
  13    2
  14    8
1 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN
2 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN

[15 rows x 1 columns]

This works fine

In [63]: df.loc[0,'test'] = someValues+1

In [64]: df
Out[64]: 
     test
A B      
0 10    5
  11    4
  12    5
  13    3
  14    9
1 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN
2 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN

[15 rows x 1 columns]

As does this

In [66]: df.loc[0,:] = someValues+1

In [67]: df
Out[67]: 
     test
A B      
0 10    5
  11    4
  12    5
  13    3
  14    9
1 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN
2 10  NaN
  11  NaN
  12  NaN
  13  NaN
  14  NaN

[15 rows x 1 columns]

Not clear where you generated the cases in your question. I think the logic is pretty straightforward and consistent (their were several inconsistencies in prior versions however).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.