Convert pandas concat of dataframes to multiindex

Question

Context

So I'm iterating through a bunch of files where each file is a subject, and in each file there are 3 columns, each representing the x,y,z axis at a certain point (the lengths across files are not the same). I want to put all of them into a multi-index PD df.

What I've tried

I found this post and when I do it, it seems to work

d_ = dict()
DATA_ROOT = "../sample_data/chest_mounted/"
cutoff_min = 0
for fileName in os.listdir(DATA_ROOT):
    if ".csv" in fileName and '.swp' not in fileName:
        with open(DATA_ROOT + fileName) as f:
            data = np.asarray(list(map(lambda x: x.strip().split(",")[1:-1], f.readlines())), dtype=np.int)
            subj_key = "Subject_" + str(fileName.split(".")[0])
            d_[subj_key] = pd.DataFrame(data, columns=['x_acc', 'y_acc', 'z_acc'])
df = pd.concat(d_.values(), keys=d_.keys())

When I do df.head() it looks exactly like what I want (I think?)

                x_acc   y_acc   z_acc
Subject_1   0   1502    2215    2153
            1   1667    2072    2047
            2   1611    1957    1906
            3   1601    1939    1831
            4   1643    1965    1879

The Problem

However, when I try to index by Subject_x I get an error. Instead, I have to first do something like

df["x_acc"]["Subject_1"]

where I access the x_acc first then the Subject_1.

Questions

1) I had the impression that I was creating a multi-index but trying df["x_acc"]["Subject_1"] that does not seem to be the case. How do I transform it to that?

2) Is there any way to change the index so that I access by Subject first?

jezrael · Accepted Answer · 2017-11-15 15:22:24Z

2

Use loc for selecting - first by level of MultiIndex and then by column name or xs implemented for simple selections:

df = df.loc['Subject_1', 'x_acc']
print (df)
0    1502
1    1667
2    1611
3    1601
4    1643
Name: x_acc, dtype: int64

df = df.xs('Subject_1')
print (df)
   x_acc  y_acc  z_acc
0   1502   2215   2153
1   1667   2072   2047
2   1611   1957   1906
3   1601   1939   1831
4   1643   1965   1879

And for more complicated selections use slicers:

idx = pd.IndexSlice

df = df.loc['Subject_1', idx['x_acc','y_acc']]
print (df)
   x_acc  y_acc
0   1502   2215
1   1667   2072
2   1611   1957
3   1601   1939
4   1643   1965

Also it seems your code should be simplify by read_csv:

d_ = dict()
DATA_ROOT = "../sample_data/chest_mounted/"
cutoff_min = 0
for fileName in os.listdir(DATA_ROOT):
    if ".csv" in fileName and '.swp' not in fileName:
        subj_key = "Subject_" + str(fileName.split(".")[0])
        d_[subj_key] = pd.read_csv(fileName,  names=['x_acc', 'y_acc', 'z_acc'])

df = pd.concat(d_)

edited Nov 15, 2017 at 15:22

answered Nov 15, 2017 at 15:12

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

IanQ Over a year ago

awesome! Thank you so much it worked. Also nice one re: read_csv it was way faster than what I was doing

StayFoolish Over a year ago

I noticed that OP doesn't use all items in the rows of the file, only slicing [1:-1], so you need to modify the pd.read_csv a little bit.

jezrael Over a year ago

@StayFoolish - Yes, if need remove first and last row - d_[subj_key] = pd.read_csv(fileName, names=['x_acc', 'y_acc', 'z_acc']).iloc[1:-1] should working.

Collectives™ on Stack Overflow

Convert pandas concat of dataframes to multiindex

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related