0

I want to flatten a dataframe in pandas. This is basically by duplicating the column_names with prefix/suffix of occurence/order of the column and the number of extra columns created should be based on the number of rows.

For example:

`

df = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'A': [10, 20, 30, 40],
    'B': [50, 60, 70, 80],
    'C': [90, 100, 110, 120]
})

print(df)

#    id   A   B    C
# 0   1  10  50   90
# 1   2  20  60  100
# 2   3  30  70  110
# 3   4  40  80  120



#I want something like the following.

print(result_df)

#    id1  A1   B1   C1  id2  A2   B2   C2  id3  A3   B3   C3  id4  A4   B4   C4
# 0    1  10   50   90    2  20   60  100    3  30   70  110    4  40   80  120



`

3 Answers 3

1

Try this:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'A': [10, 20, 30, 40],
    'B': [50, 60, 70, 80],
    'C': [90, 100, 110, 120]
})

df_out = df.unstack().to_frame().T.sort_index(level=0)
df_out.columns = [f'{i}{j+1}' for i, j in df_out.columns]
print(df_out)

Output:

   id1  id2  id3  id4  A1  A2  A3  A4  B1  B2  B3  B4  C1   C2   C3   C4
0    1    2    3    4  10  20  30  40  50  60  70  80  90  100  110  120
Sign up to request clarification or add additional context in comments.

Comments

1
df.unstack().to_frame().T
  id            A               B               C
   0  1  2  3   0   1   2   3   0   1   2   3   0    1    2    3
0  1  2  3  4  10  20  30  40  50  60  70  80  90  100  110  120

The .unstack() method returns a series, then .to_frame() converts it to a dataframe with one column, and finally the .T transposes this column to a row.

1 Comment

Thanks for the solution. I think you posted another solution before that seemed to do what I am looking for. Still I am trying to make the columns as id1 id2 id3 A1 A2 A3 B1 B2 B3 and so on..
0

The solution from @ScottBoston does the job, with a few modifications:

df = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'A': [10, 20, 30, 40],
    'B': [50, 60, 70, 80],
    'C': [90, 100, 110, 120],
})

def key_fn(keys):
    """Modify `id` key to ensure the top of the list"""
    try:
        keys = keys.str.replace("^id$", " id", regex=True)
    except AttributeError:
        # Ignore if regex doesn't suits, this occurs at level 1
        pass
    return keys

df_out = df.unstack().to_frame().sort_index(level=[1, 0], key=key_fn).T
df_out.columns = [f'{i}{j+1}' for i, j in df_out.columns]
print(df_out)

Output:

id1  A1  B1  C1  id2  A2  B2   C2  id3  A3  B3   C3  id4  A4  B4   C4
  1  10  50  90    2  20  60  100    3  30  70  110    4  40  80  120

Notes:

  • sort_index() sorts the index by default, but after transposing by .T the index lies in axis=1. One workaround is to transpose after sorting, as I did here.
  • The level has to be specified by sorting DataFrame first by level 1 (row=(0, 1, 2, 3)), then by level 0 (columns=(id, A, B, C)) => level=[1, 0].
  • To ensure id is listed first, a key adapter key_fn prepends a space for column id.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.