Convert columns into rows with Pandas

Question

So my dataset has some information by location for n dates. The problem is each date is actually a different column header. For example the CSV looks like

location    name    Jan-2010    Feb-2010    March-2010
A           "test"  12          20          30
B           "foo"   18          20          25

What I would like is for it to look like

location    name    Date        Value
A           "test"  Jan-2010    12       
A           "test"  Feb-2010    20
A           "test"  March-2010  30
B           "foo"   Jan-2010    18       
B           "foo"   Feb-2010    20
B           "foo"   March-2010  25

My problem is I don't know how many dates are in the column (though I know they will always start after name)

Related canonical question: How do I melt a pandas dataframe? — wjandrea
– wjandrea, Commented Jul 11, 2024 at 23:09

wjandrea · Accepted Answer · 2024-07-10 17:47:55Z

427

Use .melt:

df.melt(id_vars=["location", "name"],
        var_name="Date",
        value_name="Value")

  location    name        Date  Value
0        A  "test"    Jan-2010     12
1        B   "foo"    Jan-2010     18
2        A  "test"    Feb-2010     20
3        B   "foo"    Feb-2010     20
4        A  "test"  March-2010     30
5        B   "foo"  March-2010     25

Old(er) versions: <0.20

You can use pd.melt to get most of the way there, and then sort:

>>> df
  location  name  Jan-2010  Feb-2010  March-2010
0        A  test        12        20          30
1        B   foo        18        20          25
>>> df2 = pd.melt(df,
                  id_vars=["location", "name"], 
                  var_name="Date",
                  value_name="Value")
>>> df2
  location  name        Date  Value
0        A  test    Jan-2010     12
1        B   foo    Jan-2010     18
2        A  test    Feb-2010     20
3        B   foo    Feb-2010     20
4        A  test  March-2010     30
5        B   foo  March-2010     25
>>> df2 = df2.sort(["location", "name"])
>>> df2
  location  name        Date  Value
0        A  test    Jan-2010     12
2        A  test    Feb-2010     20
4        A  test  March-2010     30
1        B   foo    Jan-2010     18
3        B   foo    Feb-2010     20
5        B   foo  March-2010     25

(Might want to throw in a .reset_index(drop=True), just to keep the output clean.)

Note: pd.DataFrame.sort has been deprecated in favour of pd.DataFrame.sort_values.

edited Jul 10, 2024 at 17:47

wjandrea

34k10 gold badges69 silver badges105 bronze badges

answered Feb 22, 2015 at 3:21

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

3kstc Over a year ago

@DSM what would be the inverse of this function. i.e. how would one convert df2 [back] to df

Teepeemm Over a year ago

@3kstc Try here or here. You're wanting to look into pivots. Possibly pandas.pivot_table(df2,values='Value',index=['location','name'],columns='Date').reset_index().

Adrian Over a year ago

@DSM is there any way to go backwards? Meaning that I have a lot of rows with the same name and I would want all the dates to be on different columns

Orhan Solak Over a year ago

@Adrian you can unmelt / reverse melt (a.k.a pivoting) on df operations. For more details check this stackoverflow.com/questions/28337117/…

jezrael · Accepted Answer · 2019-02-20 09:54:44Z

30

Use set_index with stack for MultiIndex Series, then for DataFrame add reset_index with rename:

df1 = (df.set_index(["location", "name"])
         .stack()
         .reset_index(name='Value')
         .rename(columns={'level_2':'Date'}))
print (df1)
  location  name        Date  Value
0        A  test    Jan-2010     12
1        A  test    Feb-2010     20
2        A  test  March-2010     30
3        B   foo    Jan-2010     18
4        B   foo    Feb-2010     20
5        B   foo  March-2010     25

answered Feb 20, 2019 at 9:54

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Comments

jpp · Accepted Answer · 2018-11-13 17:00:01Z

10

`pd.wide_to_long`

You can add a prefix to your year columns and then feed directly to pd.wide_to_long. I won't pretend this is efficient, but it may in certain situations be more convenient than pd.melt, e.g. when your columns already have an appropriate prefix.

df.columns = np.hstack((df.columns[:2], df.columns[2:].map(lambda x: f'Value{x}')))

res = pd.wide_to_long(df, stubnames=['Value'], i='name', j='Date').reset_index()\
        .sort_values(['location', 'name'])

print(res)

   name        Date location  Value
0  test    Jan-2010        A     12
2  test    Feb-2010        A     20
4  test  March-2010        A     30
1   foo    Jan-2010        B     18
3   foo    Feb-2010        B     20
5   foo  March-2010        B     25

answered Nov 13, 2018 at 17:00

jpp

166k37 gold badges301 silver badges363 bronze badges

3 Comments

Rabinzel Over a year ago

I know this is few years old now, but while learning the differences on how to use pd.stack() pd.melt() and pd.wide_to_long() i came across to this answer, tested it myself and it just didn't want to get me the same result (I just got an empty df for res). In the end I figured out I need to add suffix=r".+" to come to the same result. Was it different back then or did it never worked but nobody noticed or cared? Or did I miss something ? It is not that I want to correct here, I just want to be sure I understand what is going on with these commands.

jpp Over a year ago

@Rabinzel, I'm not sure what has changed in the functionality. But what I can say is that I tested the code and it worked at the time I wrote this answer. It would be interesting, if it's true, to know why the suffix parameter is required.

Rabinzel Over a year ago

thanks for the reply. Just wanted to verify if the problem is on my side or I am missunderstanding something. After googling a bit, I read several times that wide_to_long expects numerical suffix or it will fail but in the documentation all it says is that suffix="\d+" is the default.

Lohith Arcot · Accepted Answer · 2018-06-25 06:49:06Z

8

I guess I found a simpler solution

temp1 = pd.melt(df1, id_vars=["location"], var_name='Date', value_name='Value')
temp2 = pd.melt(df1, id_vars=["name"], var_name='Date', value_name='Value')

Concat whole temp1 with temp2's column name

temp1['new_column'] = temp2['name']

You now have what you asked for.

answered Jun 25, 2018 at 6:49

Lohith Arcot

1,18614 silver badges22 bronze badges

1 Comment

wjandrea Over a year ago

How is that what they asked for? This melts more columns than they want and the result has two more rows than it should. Am I missing something?

jjurm · Accepted Answer · 2021-03-24 13:50:51Z

5

Adding a link to a notebook which you can duplicate, demonstrating @DMS's answer using pandas.melt:

df.melt(id_vars=["location", "name"], 
    var_name="date", 
    value_name="value")

https://deepnote.com/@DataScience/Unpivot-a-DataFrame-from-wide-to-long-format-lN7WlqOdSlqroI_7DGAkoA

answered Mar 24, 2021 at 13:50

jjurm

5096 silver badges11 bronze badges

Comments

Muhammad Talha · Accepted Answer · 2022-08-10 08:03:18Z

4

If you want to swap your rows with columns & columns with rows then try the transpose method of pandas:

df.T

Check the reference link: https://note.nkmk.me/en/python-pandas-t-transpose/

answered Aug 10, 2022 at 8:03

Muhammad Talha

8521 gold badge7 silver badges10 bronze badges

1 Comment

mins Over a year ago

Transposing mean inverting row/column indexes, keeping the same labels. The OP wants to move some column labels to row index as levels and some column labels out of both indexes. .T is of no help here.

Collectives™ on Stack Overflow

Convert columns into rows with Pandas

6 Answers 6

Old(er) versions: <0.20

4 Comments

Comments

`pd.wide_to_long`

3 Comments

1 Comment

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Old(er) versions: <0.20

4 Comments

Comments

3 Comments

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related