1

I have some features that I want to write to some csv files. I want to use pandas for this approach if possible.
I am following the instruction in here and have created some dummy data to check it out. Basically there are some activities with a random number of features belonging to them.

import io
data = io.StringIO('''Activity,id,value,value,value,value,value,value,value,value,value
Run,1,1,2,2,5,6,4,3,2,1
Run,1,2,4,4,10,12,8,6,4,2
Stand,2,1.5,3.,3.,7.5,9.,6.,4.5,3.,1.5
Sit,3,0.5,1.,1.,2.5,3.,2.,1.5,1.,0.5
Sit,3,0.6,1.2,1.2,3.,3.6,2.4,1.8,1.2,0.6
Run, 2, 0.8, 1.6, 1.6, 4. , 4.8, 3.2, 2.4, 1.6, 0.8
''')
df_unindexed = pd.read_csv(data)
df = df_unindexed.set_index(['Activity', 'id'])

When I run:

df.xs('Run')

I get

    value  value.1  value.2  value.3  value.4  value.5  value.6  value.7  \
id                                                                         
1     1.0      2.0      2.0      5.0      6.0      4.0      3.0      2.0   
1     2.0      4.0      4.0     10.0     12.0      8.0      6.0      4.0   
2     0.8      1.6      1.6      4.0      4.8      3.2      2.4      1.6   
    value.8  
id           
1       1.0  
1       2.0  
2       0.8 

which almost what I want, that is all run activities. I want to remove the 1st row and 1st column, i.e. the header and the id column. How do I achieve this?

Also a second question is when I want only one activity, how do I get it.
When using

idx = pd.IndexSlice
df.loc[idx['Run', 1], :]

gives

             value  value.1  value.2  value.3  value.4  value.5  value.6  \
Activity id                                                                
Run      1     1.0      2.0      2.0      5.0      6.0      4.0      3.0   
         1     2.0      4.0      4.0     10.0     12.0      8.0      6.0   
             value.7  value.8  
Activity id                    
Run      1       2.0      1.0  
         1       4.0      2.0  

but slicing does not work as I would expect. For example trying

df.loc[idx['Run', 1], 2:11]

instead produces an error:

TypeError: cannot do slice indexing on with these indexers [2] of 'int'>

So, how do I get my features in this place?

P.S. If not clear I am new to Pandas so be gentle. Also the column id is editable to be unique to each activity or to whole dataset if this makes things easier etc

3
  • 1
    You want columns 2 through 11? With loc you can only use labels: df.loc[idx['Run', 1], 'value.1':'value.5']. Commented May 11, 2018 at 8:40
  • Displays slice with df.xs('Run')... want to remove the header row and id column. Do you understand that you can control what gets written out with pd.to_csv()? You can make it different to what you see with df.loc/.iloc/.xs() Commented May 11, 2018 at 8:43
  • Just do pd.read_csv(header=0, ...) to read in the header row as a header row, and ... index_col=['id'] or index_col=0 to pick the index column. At CSV read-time. Commented May 11, 2018 at 8:49

3 Answers 3

2

You can use a little hack - get columns names by positions, because iloc for MultiIndex is not yet supported:

print (df.columns[2:11])
Index(['value.2', 'value.3', 'value.4', 'value.5', 'value.6', 'value.7',
       'value.8'],
      dtype='object')

idx = pd.IndexSlice
print (df.loc[idx['Run', 1], df.columns[2:11]])
             value.2  value.3  value.4  value.5  value.6  value.7  value.8
Activity id                                                               
Run      1       2.0      5.0      6.0      4.0      3.0      2.0      1.0
         1       4.0     10.0     12.0      8.0      6.0      4.0      2.0

If want save file to csv without index and columns:

df.xs('Run').to_csv(file, index=False, header=None)
Sign up to request clarification or add additional context in comments.

2 Comments

I have latency in loading new answers, but I'm not surprised you answered first :) +1
@zipa - Hmm, same problem in home for me :( Thank you.
0

I mostly look at https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer when I'm stuck with these kind of issues.

Without any testing I think you can remove rows and columns like

df = df.drop(['rowindex'], axis=0)
df = df.drop(['colname'], axis=1)

2 Comments

Just do pd.read_csv(header=0) to read in the header row as a header row, and ... index_col=['id'] or index_col=0 to pick the index column. At CSV read-time.
True. Solves the issue.
0

Avoid the problem by recognizing the index columns at CSV read-time:

pd.read_csv(header=0, # to read in the header row as a header row, and 
... index_col=['id'] or index_col=0 to pick the index column.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.