Pandas slicing data with MultiIndex

Question

I have some features that I want to write to some csv files. I want to use pandas for this approach if possible.
I am following the instruction in here and have created some dummy data to check it out. Basically there are some activities with a random number of features belonging to them.

import io
data = io.StringIO('''Activity,id,value,value,value,value,value,value,value,value,value
Run,1,1,2,2,5,6,4,3,2,1
Run,1,2,4,4,10,12,8,6,4,2
Stand,2,1.5,3.,3.,7.5,9.,6.,4.5,3.,1.5
Sit,3,0.5,1.,1.,2.5,3.,2.,1.5,1.,0.5
Sit,3,0.6,1.2,1.2,3.,3.6,2.4,1.8,1.2,0.6
Run, 2, 0.8, 1.6, 1.6, 4. , 4.8, 3.2, 2.4, 1.6, 0.8
''')
df_unindexed = pd.read_csv(data)
df = df_unindexed.set_index(['Activity', 'id'])

When I run:

df.xs('Run')

I get

    value  value.1  value.2  value.3  value.4  value.5  value.6  value.7  \
id                                                                         
1     1.0      2.0      2.0      5.0      6.0      4.0      3.0      2.0   
1     2.0      4.0      4.0     10.0     12.0      8.0      6.0      4.0   
2     0.8      1.6      1.6      4.0      4.8      3.2      2.4      1.6   
    value.8  
id           
1       1.0  
1       2.0  
2       0.8

which almost what I want, that is all run activities. I want to remove the 1st row and 1st column, i.e. the header and the id column. How do I achieve this?

Also a second question is when I want only one activity, how do I get it.
When using

idx = pd.IndexSlice
df.loc[idx['Run', 1], :]

gives

             value  value.1  value.2  value.3  value.4  value.5  value.6  \
Activity id                                                                
Run      1     1.0      2.0      2.0      5.0      6.0      4.0      3.0   
         1     2.0      4.0      4.0     10.0     12.0      8.0      6.0   
             value.7  value.8  
Activity id                    
Run      1       2.0      1.0  
         1       4.0      2.0

but slicing does not work as I would expect. For example trying

df.loc[idx['Run', 1], 2:11]

instead produces an error:

TypeError: cannot do slice indexing on with these indexers [2] of 'int'>

So, how do I get my features in this place?

P.S. If not clear I am new to Pandas so be gentle. Also the column id is editable to be unique to each activity or to whole dataset if this makes things easier etc

You want columns 2 through 11? With loc you can only use labels: df.loc[idx['Run', 1], 'value.1':'value.5']. — user2285236
– user2285236, Commented May 11, 2018 at 8:40
Displays slice with df.xs('Run')... want to remove the header row and id column. Do you understand that you can control what gets written out with pd.to_csv()? You can make it different to what you see with df.loc/.iloc/.xs() — smci
– smci, Commented May 11, 2018 at 8:43
Just do pd.read_csv(header=0, ...) to read in the header row as a header row, and ... index_col=['id'] or index_col=0 to pick the index column. At CSV read-time. — smci
– smci, Commented May 11, 2018 at 8:49

jezrael · Accepted Answer · 2018-05-11 08:56:24Z

2

You can use a little hack - get columns names by positions, because iloc for MultiIndex is not yet supported:

print (df.columns[2:11])
Index(['value.2', 'value.3', 'value.4', 'value.5', 'value.6', 'value.7',
       'value.8'],
      dtype='object')

idx = pd.IndexSlice
print (df.loc[idx['Run', 1], df.columns[2:11]])
             value.2  value.3  value.4  value.5  value.6  value.7  value.8
Activity id                                                               
Run      1       2.0      5.0      6.0      4.0      3.0      2.0      1.0
         1       4.0     10.0     12.0      8.0      6.0      4.0      2.0

If want save file to csv without index and columns:

df.xs('Run').to_csv(file, index=False, header=None)

edited May 11, 2018 at 8:56

answered May 11, 2018 at 8:49

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

zipa Over a year ago

I have latency in loading new answers, but I'm not surprised you answered first :) +1

jezrael Over a year ago

@zipa - Hmm, same problem in home for me :( Thank you.

LennBo · Accepted Answer · 2018-05-11 08:53:49Z

0

I mostly look at https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer when I'm stuck with these kind of issues.

Without any testing I think you can remove rows and columns like

df = df.drop(['rowindex'], axis=0)
df = df.drop(['colname'], axis=1)

edited May 11, 2018 at 8:53

answered May 11, 2018 at 8:47

LennBo

251 silver badge10 bronze badges

2 Comments

smci Over a year ago

Just do pd.read_csv(header=0) to read in the header row as a header row, and ... index_col=['id'] or index_col=0 to pick the index column. At CSV read-time.

LennBo Over a year ago

True. Solves the issue.

smci · Accepted Answer · 2018-05-11 09:14:11Z

0

Avoid the problem by recognizing the index columns at CSV read-time:

pd.read_csv(header=0, # to read in the header row as a header row, and 
... index_col=['id'] or index_col=0 to pick the index column.

answered May 11, 2018 at 9:14

smci

34.2k21 gold badges118 silver badges152 bronze badges

Collectives™ on Stack Overflow

Pandas slicing data with MultiIndex

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related