3

I need to unpivot a pandas dataframe. I am using pd.melt() function for this. It is working as expected, now I need to add an additional column "column_number" in my output. Example below:

name age gender  id
a   18  m       1   
b   20  f       2 

Current Output:

   id   variable   value
    1    name        a
    1    age         18
    1    gender      m
    2    name        b
    2    age         20
    2    gender      f

Expected Output:

id  column_number  variable   value
1    1             name        a
1    2             age         18
1    3             gender      m
2    1             name        b
2    2             age         20
2    3             gender      f

Since my dataframe structure can change, I will not know if I have 3 columns or more in future. How can I generate this column_number column in melt results?

5 Answers 5

2

One possible solution is to use .groupby with .cumcount():

out = df.set_index("id").stack().to_frame(name="value")
out["column_number"] = out.groupby(level=0).cumcount() + 1

print(out.reset_index().rename(columns={"level_1": "variable"}))

Prints:

   id variable value  column_number
0   1     name     a              1
1   1      age    18              2
2   1   gender     m              3
3   2     name     b              1
4   2      age    20              2
5   2   gender     f              3

Or if you have already melted df:

df["column_number"] = df.groupby("id").cumcount() + 1
print(df)

If order matters:

df.insert(1, 'column_number', df.groupby("id").cumcount() + 1)
print(df)
Sign up to request clarification or add additional context in comments.

Comments

2

One way to achieve this is to create a column multi-index using the column number and then melting that result:

out = df.set_index('id')
out.columns = pd.MultiIndex.from_tuples(enumerate(out, 1), names=['column_number', 'variable'])
out = out.melt(ignore_index=False).sort_index().reset_index()

Output:

   id  column_number variable value
0   1              1     name     a
1   1              2      age    18
2   1              3   gender     m
3   2              1     name     b
4   2              2      age    20
5   2              3   gender     f

2 Comments

pd.MultiIndex.from_tuples(enumerate(out.columns,1), names=['column_number', 'variable']) should suffice
@Onyambu I seem to be having a bit of a slow brain day. Thanks for pointing that out; I've edited
1

You can use melt as you already did it and chain with assign

out = (df.melt(id_vars='id', ignore_index=False).sort_index()
         .assign(column_number=lambda x: x.groupby('id').cumcount()+1))
print(out)

# Output
   id variable value  column_number
0   1     name     a              1
0   1      age    18              2
0   1   gender     m              3
1   2     name     b              1
1   2      age    20              2
1   2   gender     f              3

Comments

1

Use row_number within mutate after you have melted the data to long using pivot_longer from siuba:

from siuba import _, mutate, ungroup, group_by
from siuba.dply.vector import row_number
from siuba.experimental.pivot import pivot_longer

(pivot_longer(df, ~_.id, names_to = 'variable') >>
   group_by(_.id) >>
   mutate(column_number = row_number(_.id))>>
   ungroup())

   id variable value  column_number
0   1     name     a              1
0   1      age    18              2
0   1   gender     m              3
1   2     name     b              1
1   2      age    20              2
1   2   gender     f              3

Comments

1

Since melt preserves the original order of columns, you don't need a groupby.cumcount, a simple factorize is sufficient (and more efficient):

out = (df.melt('id')
         .assign(column_number=lambda d: pd.factorize(d['variable'])[0]+1)
         .sort_values(by='id', ignore_index=True)
      )

If what you want is the original position of the columns (this also considering the non-melted columns), then a simple map is enough:

cols = {k: i for i,k in enumerate(df, start=1)}
# {'name': 1, 'age': 2, 'gender': 3, 'id': 4}

out = (df.melt('id')
         .assign(column_number=lambda d: d['variable'].map(cols))
         .sort_values(by='id', ignore_index=True)
      )

Output:

   id variable value  column_number
0   1     name     a              1
1   1      age    18              2
2   1   gender     m              3
3   2     name     b              1
4   2      age    20              2
5   2   gender     f              3

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.