1

I've got a CSV with a rather messy format:

t, 01_x, 01_y, 02_x, 02_y
0,     0,    1,    ,
1,     1,    1,   0,    0  

Thereby "01_" and "02_" are numbers of entities (1, 2), which can vary from file to file and there might be additional columns too (but at least the same for all entities). Note also that entity 2 enters the scene at t=1 (no entries at t=0).

I already import the CSV into a pandas dataframe, but don't see the way to transform the stuff into the following form:

t, entity, x, y
0,      1, 0, 1
1,      1, 1, 1
1       2, 0, 0

Is there a simple (pythonic) way to transform that?

Thanks! René

6
  • 1
    Please do not edit questions which already have an answer as they were asked in a way that it invalidates said answer. Please consider undoing your edit. If a given answer makes you realise that you actually wanted to ask a different question, then please create that new different question by using the "Ask Question" button again. Commented Mar 10, 2020 at 16:27
  • @Yunnosch I disagree. There's no harm, and typically only benefit, in editing questions until an answer is accepted. In this case the sample data and expected output remained unchanged and only clarifying details were added. Suggesting that people shouldn't edit because someone jumped to add an answer (in this case without fully understanding the question), is the fault of the answerer, not the asker. Commented Mar 10, 2020 at 16:30
  • @ALollz Opinions divide. Mine is that an answer which gets downvotes because it does not answer the question after the edit is a reason for the author to be frustrated, i.e. it is unfair towards those who spend effort on the attempt to help. By the way, when do you expect an answer to be accepted, which gets then invalidated by an edit to the question? Usually such an answer makes the OP aware of what they later change, by not answering the question they actually mean... I hope you do not suspect akers here to think "Good, that solved problem A. Now lets ask here about problem B.". The horror! Commented Mar 10, 2020 at 16:33
  • Sorry @HaydenEastwood, I tried to defend you by enlightening OP... Commented Mar 10, 2020 at 16:38
  • Admittedly, rolling back now would probably invalidate the now only existing answer... Sigh. Commented Mar 10, 2020 at 16:39

1 Answer 1

1

This is wide_to_long, but we need to first swap the order of your column names around the '_'

df.columns = ['_'.join(x.split('_')[::-1]) for x in df.columns]
#Index(['t', 'x_01', 'y_01', 'x_02', 'y_02'], dtype='object')

(pd.wide_to_long(df, i='t', j='entity', stubnames=['x', 'y'], sep='_')
   .dropna()
   .reset_index())

   t  entity    x    y
0  0       1  0.0  1.0
1  1       1  1.0  1.0
2  1       2  0.0  0.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.