4

I have a dataframe with two columns:

x y
0 1
1 1
2 2
0 5
1 6
2 8
0 1
1 8
2 4
0 1
1 7
2 3

What I want is:

x val1 val2 val3 val4
0 1 5 1 1
1 1 6 8 7
2 2 8 4 3

I know that the values in column x are repeated all N times.

1
  • 1
    I think that a column header is missing in your expected output, since you have five num columns and four headers. Commented Jan 1, 2016 at 13:39

1 Answer 1

8

You could use groupby/cumcount to assign column numbers and then call pivot:

import pandas as pd

df = pd.DataFrame({'x': [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
                   'y': [1, 1, 2, 5, 6, 8, 1, 8, 4, 1, 7, 3]})

df['columns'] = df.groupby('x')['y'].cumcount()
#     x  y  columns
# 0   0  1        0
# 1   1  1        0
# 2   2  2        0
# 3   0  5        1
# 4   1  6        1
# 5   2  8        1
# 6   0  1        2
# 7   1  8        2
# 8   2  4        2
# 9   0  1        3
# 10  1  7        3
# 11  2  3        3

result = df.pivot(index='x', columns='columns')
print(result)

yields

         y         
columns  0  1  2  3
x                  
0        1  5  1  1
1        1  6  8  7
2        2  8  4  3

Or, if you can really rely on the values in x being repeated in order N times,

N = 3
result = pd.DataFrame(df['y'].values.reshape(-1, N).T)

yields

   0  1  2  3
0  1  5  1  1
1  1  6  8  7
2  2  8  4  3

Using reshape is quicker than calling groupby/cumcount and pivot, but it is less robust since it relies on the values in y appearing in the right order.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.