I have some features that I want to write to some csv files. I want to use pandas for this approach if possible.
I am following the instruction in here and have created some dummy data to check it out. Basically there are some activities with a random number of features belonging to them.
import io
data = io.StringIO('''Activity,id,value,value,value,value,value,value,value,value,value
Run,1,1,2,2,5,6,4,3,2,1
Run,1,2,4,4,10,12,8,6,4,2
Stand,2,1.5,3.,3.,7.5,9.,6.,4.5,3.,1.5
Sit,3,0.5,1.,1.,2.5,3.,2.,1.5,1.,0.5
Sit,3,0.6,1.2,1.2,3.,3.6,2.4,1.8,1.2,0.6
Run, 2, 0.8, 1.6, 1.6, 4. , 4.8, 3.2, 2.4, 1.6, 0.8
''')
df_unindexed = pd.read_csv(data)
df = df_unindexed.set_index(['Activity', 'id'])
When I run:
df.xs('Run')
I get
value value.1 value.2 value.3 value.4 value.5 value.6 value.7 \
id
1 1.0 2.0 2.0 5.0 6.0 4.0 3.0 2.0
1 2.0 4.0 4.0 10.0 12.0 8.0 6.0 4.0
2 0.8 1.6 1.6 4.0 4.8 3.2 2.4 1.6
value.8
id
1 1.0
1 2.0
2 0.8
which almost what I want, that is all run activities. I want to remove the 1st row and 1st column, i.e. the header and the id column. How do I achieve this?
Also a second question is when I want only one activity, how do I get it.
When using
idx = pd.IndexSlice
df.loc[idx['Run', 1], :]
gives
value value.1 value.2 value.3 value.4 value.5 value.6 \
Activity id
Run 1 1.0 2.0 2.0 5.0 6.0 4.0 3.0
1 2.0 4.0 4.0 10.0 12.0 8.0 6.0
value.7 value.8
Activity id
Run 1 2.0 1.0
1 4.0 2.0
but slicing does not work as I would expect. For example trying
df.loc[idx['Run', 1], 2:11]
instead produces an error:
TypeError: cannot do slice indexing on with these indexers [2] of 'int'>
So, how do I get my features in this place?
P.S. If not clear I am new to Pandas so be gentle. Also the column id is editable to be unique to each activity or to whole dataset if this makes things easier etc
locyou can only use labels:df.loc[idx['Run', 1], 'value.1':'value.5'].df.xs('Run')... want to remove the header row andidcolumn. Do you understand that you can control what gets written out withpd.to_csv()? You can make it different to what you see withdf.loc/.iloc/.xs()pd.read_csv(header=0, ...)to read in the header row as a header row, and... index_col=['id']orindex_col=0to pick the index column. At CSV read-time.