How to select second level in multiindex when using columns?

Question

I have a dataframe with this index:

index = pd.MultiIndex.from_product([['stock1','stock2'...],['price','volume'...]])

It's a useful structure for being able to do df['stock1'], but how do I select all the price data? I can't make any sense of the documentation.

I've tried the following with no luck: df[:,'price'] df[:]['price'] df.loc(axis=1)[:,'close'] df['price]

If this index style is generally agreed to be a bad idea for whatever reason, then what would be a better choice? Should I go for a multi-indexed index for the stocks as labels on the time series instead of at the column level?

EDIT - I am using the multiindex for the columns, not the index (the wording got the better of me). The examples in the documentation focus on multi-level indexes rather than column structures.

If you are interested in learning more about slicing and filtering multiindex DataFrames, please take a look at my post: How do I slice or filter MultiIndex DataFrame levels?. — cs95
– cs95, Commented Jan 5, 2019 at 7:10
df.loc(axis=1)[:,'price'] works fine for me in pandas 1.5; perhaps it was enhanced more recently. — fantabolous
– fantabolous, Commented Apr 28, 2023 at 6:46

Andrew L · Accepted Answer · 2017-07-16 14:05:00Z

147

Also using John's data sample:

Using xs() is another way to slice a MultiIndex:

df
               0
stock1 price   1
       volume  2
stock2 price   3
       volume  4
stock3 price   5
       volume  6

df.xs('price', level=1, drop_level=False)
              0
stock1 price  1
stock2 price  3
stock3 price  5

Alternatively if you have a MultiIndex in place of columns:

df
  stock1        stock2        stock3       
   price volume  price volume  price volume
0      1      2      3      4      5      6

df.xs('price', axis=1, level=1, drop_level=False)
  stock1 stock2 stock3
   price  price  price
0      1      3      5

answered Jul 16, 2017 at 14:05

Andrew L

7,1083 gold badges28 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

AndyMoore Over a year ago

Perfect thanks. as a result of my being new to Multiindexing my question was poorly written. I was using the multiindex for the columns, not the index. df.xs('price',axis=1,level=1) does the job perfectly

Itamar Mushkin Over a year ago

Just here to say that .xs is still in use in pandas 1.1.3: pandas.pydata.org/pandas-docs/stable/user_guide/advanced.html

MaxU - stand with Ukraine · Accepted Answer · 2020-06-23 23:23:56Z

64

Using @JohnZwinck's data sample:

In [132]: df
Out[132]:
               0
stock1 price   1
       volume  2
stock2 price   3
       volume  4
stock3 price   5
       volume  6

Option 1:

In [133]: df.loc[(slice(None), slice('price')), :]
Out[133]:
              0
stock1 price  1
stock2 price  3
stock3 price  5

Option 2:

In [134]: df.loc[pd.IndexSlice[:, 'price'], :]
Out[134]:
              0
stock1 price  1
stock2 price  3
stock3 price  5

UPDATE:

But what if for the 2nd Index, I want to select everything but price and there are multiple values so that enumeration is not an option. Is there something like slice(~'price')

first let's name the index levels:

df = df.rename_axis(["lvl0", "lvl1"])

now we can use the df.query() method:

In [18]: df.query("lvl1 != 'price'")
Out[18]:
               0
lvl0   lvl1
stock1 volume  2
stock2 volume  4
stock3 volume  6

edited Jun 23, 2020 at 23:23

answered Jul 16, 2017 at 13:33

MaxU - stand with Ukraine

212k37 gold badges402 silver badges437 bronze badges

7 Comments

muuh Over a year ago

Works, but what is the slice() function doing? The python website wasn't helpful for me. Said slice() returns indices. Can I do something like list(slice(...))? Apparently not.

MaxU - stand with Ukraine Over a year ago

@muuh, please check this question and answers - I hope that helps...

germ Over a year ago

For Option 1, this also works: df.loc[(slice(None),'price'), :]. In other words, to select a specific value for that index level, just use the value.

Bowen Liu Over a year ago

Great answer. But what if for the 2nd Index, I want to select everything but price and there are multiple values so that enumeration is not an option. Is there something like slice(~'price')

loco.loop Over a year ago

In df.loc[pd.IndexSlice[:, 'price'], :] what does the last : mean? Also, apparently you can do df.loc[:, 'price', :]...

|

YPOC · Accepted Answer · 2020-10-20 07:21:13Z

I have found the most intuitive solution for accessing a second-level column in a DataFrame with MultiIndex columns is using .loc together with slice().

In case of your DataFrame with

df
  stock1        stock2        stock3       
   price volume  price volume  price volume
0      1      2      3      4      5      6
1      2      3      4      5      6      7

using df.loc[:, (slice(None), "price")]

would deliver all columns with the sub-column of "price"

  stock1  stock2  stock3       
   price   price   price 
0      1       3       5
1      2       4       6

Within df.loc[:, (slice(None), "price")] the first argument of loc : delivers the result for all rows, the second argument (slice(None), "price") is a tuple responsible for selecting all first level columns (slice(None)) and all second-level columns with the name of "price".

John Zwinck · Accepted Answer · 2017-07-16 12:29:33Z

9

df.unstack() will "tear off" the last level of your MultiIndex and make your DataFrame a lot more conventional, with one column per type of data. For example:

index = pd.MultiIndex.from_product([['stock1','stock2','stock3'],['price','volume']])
df = pd.DataFrame([1,2,3,4,5,6], index)
print(df.unstack())

Gives you:

           0       
       price volume
stock1     1      2
stock2     3      4
stock3     5      6

answered Jul 16, 2017 at 12:29

John Zwinck

252k44 gold badges346 silver badges459 bronze badges

Comments

Ilya V. Schurov · Accepted Answer · 2022-12-21 15:07:25Z

4

You can also swap levels first, then select by the first level (based on @ntg's sample data):

df = pd.DataFrame({
    'value': range(6),
    'stocks': [f'stock{i // 2}' for i in range(6)],
    'attr': ['price', 'volume'] * 3
}).set_index(['stocks', 'attr'])

df.swaplevel().loc["price"]

        value
stocks       
stock0      0
stock1      2
stock2      4

Works on columns with axis=1 as well.

answered Dec 21, 2022 at 15:07

Ilya V. Schurov

8,1974 gold badges52 silver badges83 bronze badges

Comments

ntg · Accepted Answer · 2022-07-08 07:19:15Z

1

While @MaxU's is the better answer, I want to point out here that we can also separately reset_index any part of a MultiIndex, e.g., suppose:

df = pd.DataFrame({
    'price':range(6),
    'stocks': [f'stock{i//2}' for i in range(6)],
    'attr':['price','volume']*3
}).set_index(['stocks','attr'])

leading to df:

               price
stocks attr         
stock0 price       0
       volume      1
stock1 price       2
       volume      3
stock2 price       4
       volume      5

Then e.g.:

df_rst = df.reset_index('attr')
df_rst[df_rst['attr']=='price']

will lead to:

answered Jul 8, 2022 at 7:19

ntg

14.4k10 gold badges85 silver badges107 bronze badges

Comments

bmc · Accepted Answer · 2017-07-16 12:43:45Z

-5

I also noticed you missed this option:

df.loc[:,"price"]

As far as a best practice for your time data, keep it in a column corresponding to rows, preferably as a datetime object in Python (pandas has built in feature support for it). You can use the mask syntax to only get times relevant to your interest.

That is how you access a single column of your data frame. However for multiple columns we can pass a list, or a colon to get all:

df.loc[:,["price","volume"]] 
#or
df.loc[:,:]

A useful way to query (and quickly) is to use masks to specify which rows/columns meet what condition you want:

Mask=df.loc[:,"price"]>50.0
df.loc[Mask, "stock"] #should return the stock prices greater than 50bucks.

Hope this helps, and as always feel free to follow up on this answer if I completely misunderstood your question, I'd love to help further.

edited Jul 16, 2017 at 12:43

answered Jul 16, 2017 at 12:40

bmc

8651 gold badge12 silver badges24 bronze badges

2 Comments

bmc Over a year ago

are you using data frame?

John Zwinck Over a year ago

Yes, look at my answer here for the precise DataFrame I am using.

Collectives™ on Stack Overflow

How to select second level in multiindex when using columns?

7 Answers 7

2 Comments

7 Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

2 Comments

7 Comments

Comments

Comments

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related