Context
So I'm iterating through a bunch of files where each file is a subject, and in each file there are 3 columns, each representing the x,y,z axis at a certain point (the lengths across files are not the same). I want to put all of them into a multi-index PD df.
What I've tried
I found this post and when I do it, it seems to work
d_ = dict()
DATA_ROOT = "../sample_data/chest_mounted/"
cutoff_min = 0
for fileName in os.listdir(DATA_ROOT):
if ".csv" in fileName and '.swp' not in fileName:
with open(DATA_ROOT + fileName) as f:
data = np.asarray(list(map(lambda x: x.strip().split(",")[1:-1], f.readlines())), dtype=np.int)
subj_key = "Subject_" + str(fileName.split(".")[0])
d_[subj_key] = pd.DataFrame(data, columns=['x_acc', 'y_acc', 'z_acc'])
df = pd.concat(d_.values(), keys=d_.keys())
When I do df.head() it looks exactly like what I want (I think?)
x_acc y_acc z_acc
Subject_1 0 1502 2215 2153
1 1667 2072 2047
2 1611 1957 1906
3 1601 1939 1831
4 1643 1965 1879
The Problem
However, when I try to index by Subject_x I get an error. Instead, I have to first do something like
df["x_acc"]["Subject_1"]
where I access the x_acc first then the Subject_1.
Questions
1) I had the impression that I was creating a multi-index but trying df["x_acc"]["Subject_1"] that does not seem to be the case. How do I transform it to that?
2) Is there any way to change the index so that I access by Subject first?