Flatten the dataframe in pandas

Question

I want to flatten a dataframe in pandas. This is basically by duplicating the column_names with prefix/suffix of occurence/order of the column and the number of extra columns created should be based on the number of rows.

For example:

`

df = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'A': [10, 20, 30, 40],
    'B': [50, 60, 70, 80],
    'C': [90, 100, 110, 120]
})

print(df)

#    id   A   B    C
# 0   1  10  50   90
# 1   2  20  60  100
# 2   3  30  70  110
# 3   4  40  80  120



#I want something like the following.

print(result_df)

#    id1  A1   B1   C1  id2  A2   B2   C2  id3  A3   B3   C3  id4  A4   B4   C4
# 0    1  10   50   90    2  20   60  100    3  30   70  110    4  40   80  120

`

Scott Boston · Accepted Answer · 2022-12-28 19:53:00Z

1

Try this:

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'A': [10, 20, 30, 40],
    'B': [50, 60, 70, 80],
    'C': [90, 100, 110, 120]
})

df_out = df.unstack().to_frame().T.sort_index(level=0)
df_out.columns = [f'{i}{j+1}' for i, j in df_out.columns]
print(df_out)

Output:

   id1  id2  id3  id4  A1  A2  A3  A4  B1  B2  B3  B4  C1   C2   C3   C4
0    1    2    3    4  10  20  30  40  50  60  70  80  90  100  110  120

answered Dec 28, 2022 at 19:53

Scott Boston

154k15 gold badges160 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

marc_s · Accepted Answer · 2024-07-01 03:54:24Z

1

df.unstack().to_frame().T

  id            A               B               C
   0  1  2  3   0   1   2   3   0   1   2   3   0    1    2    3
0  1  2  3  4  10  20  30  40  50  60  70  80  90  100  110  120

The .unstack() method returns a series, then .to_frame() converts it to a dataframe with one column, and finally the .T transposes this column to a row.

edited Jul 1, 2024 at 3:54

marc_s

760k186 gold badges1.4k silver badges1.5k bronze badges

answered Dec 28, 2022 at 18:41

MarianD

14.4k12 gold badges50 silver badges61 bronze badges

1 Comment

snl_lns Over a year ago

Thanks for the solution. I think you posted another solution before that seemed to do what I am looking for. Still I am trying to make the columns as id1 id2 id3 A1 A2 A3 B1 B2 B3 and so on..

Andreas · Accepted Answer · 2022-12-28 21:40:32Z

The solution from @ScottBoston does the job, with a few modifications:

df = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'A': [10, 20, 30, 40],
    'B': [50, 60, 70, 80],
    'C': [90, 100, 110, 120],
})

def key_fn(keys):
    """Modify `id` key to ensure the top of the list"""
    try:
        keys = keys.str.replace("^id$", " id", regex=True)
    except AttributeError:
        # Ignore if regex doesn't suits, this occurs at level 1
        pass
    return keys

df_out = df.unstack().to_frame().sort_index(level=[1, 0], key=key_fn).T
df_out.columns = [f'{i}{j+1}' for i, j in df_out.columns]
print(df_out)

Output:

id1  A1  B1  C1  id2  A2  B2   C2  id3  A3  B3   C3  id4  A4  B4   C4
  1  10  50  90    2  20  60  100    3  30  70  110    4  40  80  120

Notes:

sort_index() sorts the index by default, but after transposing by .T the index lies in axis=1. One workaround is to transpose after sorting, as I did here.
The level has to be specified by sorting DataFrame first by level 1 (row=(0, 1, 2, 3)), then by level 0 (columns=(id, A, B, C)) => level=[1, 0].
To ensure id is listed first, a key adapter key_fn prepends a space for column id.

Collectives™ on Stack Overflow

Flatten the dataframe in pandas

3 Answers 3

Comments

1 Comment

Output:

Notes:

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Output:

Notes:

Comments

Your Answer

Sign up or log in

Post as a guest

Related