Restructuring CSV into Pandas DataFrame

Question

I've got a CSV with a rather messy format:

t, 01_x, 01_y, 02_x, 02_y
0,     0,    1,    ,
1,     1,    1,   0,    0

Thereby "01_" and "02_" are numbers of entities (1, 2), which can vary from file to file and there might be additional columns too (but at least the same for all entities). Note also that entity 2 enters the scene at t=1 (no entries at t=0).

I already import the CSV into a pandas dataframe, but don't see the way to transform the stuff into the following form:

t, entity, x, y
0,      1, 0, 1
1,      1, 1, 1
1       2, 0, 0

Is there a simple (pythonic) way to transform that?

Thanks! René

Please do not edit questions which already have an answer as they were asked in a way that it invalidates said answer. Please consider undoing your edit. If a given answer makes you realise that you actually wanted to ask a different question, then please create that new different question by using the "Ask Question" button again. — Yunnosch
– Yunnosch, Commented Mar 10, 2020 at 16:27
@Yunnosch I disagree. There's no harm, and typically only benefit, in editing questions until an answer is accepted. In this case the sample data and expected output remained unchanged and only clarifying details were added. Suggesting that people shouldn't edit because someone jumped to add an answer (in this case without fully understanding the question), is the fault of the answerer, not the asker. — ALollz
– ALollz, Commented Mar 10, 2020 at 16:30
@ALollz Opinions divide. Mine is that an answer which gets downvotes because it does not answer the question after the edit is a reason for the author to be frustrated, i.e. it is unfair towards those who spend effort on the attempt to help. By the way, when do you expect an answer to be accepted, which gets then invalidated by an edit to the question? Usually such an answer makes the OP aware of what they later change, by not answering the question they actually mean... I hope you do not suspect akers here to think "Good, that solved problem A. Now lets ask here about problem B.". The horror! — Yunnosch
– Yunnosch, Commented Mar 10, 2020 at 16:33
Sorry @HaydenEastwood, I tried to defend you by enlightening OP... — Yunnosch
– Yunnosch, Commented Mar 10, 2020 at 16:38
Admittedly, rolling back now would probably invalidate the now only existing answer... Sigh. — Yunnosch
– Yunnosch, Commented Mar 10, 2020 at 16:39

ALollz · Accepted Answer · 2020-03-10 16:14:32Z

1

This is wide_to_long, but we need to first swap the order of your column names around the '_'

df.columns = ['_'.join(x.split('_')[::-1]) for x in df.columns]
#Index(['t', 'x_01', 'y_01', 'x_02', 'y_02'], dtype='object')

(pd.wide_to_long(df, i='t', j='entity', stubnames=['x', 'y'], sep='_')
   .dropna()
   .reset_index())

   t  entity    x    y
0  0       1  0.0  1.0
1  1       1  1.0  1.0
2  1       2  0.0  0.0

answered Mar 10, 2020 at 16:14

ALollz

59.7k7 gold badges74 silver badges97 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Restructuring CSV into Pandas DataFrame

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related