281

So my dataset has some information by location for n dates. The problem is each date is actually a different column header. For example the CSV looks like

location    name    Jan-2010    Feb-2010    March-2010
A           "test"  12          20          30
B           "foo"   18          20          25

What I would like is for it to look like

location    name    Date        Value
A           "test"  Jan-2010    12       
A           "test"  Feb-2010    20
A           "test"  March-2010  30
B           "foo"   Jan-2010    18       
B           "foo"   Feb-2010    20
B           "foo"   March-2010  25

My problem is I don't know how many dates are in the column (though I know they will always start after name)

2

6 Answers 6

427

Use .melt:

df.melt(id_vars=["location", "name"],
        var_name="Date",
        value_name="Value")

  location    name        Date  Value
0        A  "test"    Jan-2010     12
1        B   "foo"    Jan-2010     18
2        A  "test"    Feb-2010     20
3        B   "foo"    Feb-2010     20
4        A  "test"  March-2010     30
5        B   "foo"  March-2010     25

Old(er) versions: <0.20

You can use pd.melt to get most of the way there, and then sort:

>>> df
  location  name  Jan-2010  Feb-2010  March-2010
0        A  test        12        20          30
1        B   foo        18        20          25
>>> df2 = pd.melt(df,
                  id_vars=["location", "name"], 
                  var_name="Date",
                  value_name="Value")
>>> df2
  location  name        Date  Value
0        A  test    Jan-2010     12
1        B   foo    Jan-2010     18
2        A  test    Feb-2010     20
3        B   foo    Feb-2010     20
4        A  test  March-2010     30
5        B   foo  March-2010     25
>>> df2 = df2.sort(["location", "name"])
>>> df2
  location  name        Date  Value
0        A  test    Jan-2010     12
2        A  test    Feb-2010     20
4        A  test  March-2010     30
1        B   foo    Jan-2010     18
3        B   foo    Feb-2010     20
5        B   foo  March-2010     25

(Might want to throw in a .reset_index(drop=True), just to keep the output clean.)

Note: pd.DataFrame.sort has been deprecated in favour of pd.DataFrame.sort_values.

Sign up to request clarification or add additional context in comments.

4 Comments

@DSM what would be the inverse of this function. i.e. how would one convert df2 [back] to df
@3kstc Try here or here. You're wanting to look into pivots. Possibly pandas.pivot_table(df2,values='Value',index=['location','name'],columns='Date').reset_index().
@DSM is there any way to go backwards? Meaning that I have a lot of rows with the same name and I would want all the dates to be on different columns
@Adrian you can unmelt / reverse melt (a.k.a pivoting) on df operations. For more details check this stackoverflow.com/questions/28337117/…
30

Use set_index with stack for MultiIndex Series, then for DataFrame add reset_index with rename:

df1 = (df.set_index(["location", "name"])
         .stack()
         .reset_index(name='Value')
         .rename(columns={'level_2':'Date'}))
print (df1)
  location  name        Date  Value
0        A  test    Jan-2010     12
1        A  test    Feb-2010     20
2        A  test  March-2010     30
3        B   foo    Jan-2010     18
4        B   foo    Feb-2010     20
5        B   foo  March-2010     25

Comments

10

pd.wide_to_long

You can add a prefix to your year columns and then feed directly to pd.wide_to_long. I won't pretend this is efficient, but it may in certain situations be more convenient than pd.melt, e.g. when your columns already have an appropriate prefix.

df.columns = np.hstack((df.columns[:2], df.columns[2:].map(lambda x: f'Value{x}')))

res = pd.wide_to_long(df, stubnames=['Value'], i='name', j='Date').reset_index()\
        .sort_values(['location', 'name'])

print(res)

   name        Date location  Value
0  test    Jan-2010        A     12
2  test    Feb-2010        A     20
4  test  March-2010        A     30
1   foo    Jan-2010        B     18
3   foo    Feb-2010        B     20
5   foo  March-2010        B     25

3 Comments

I know this is few years old now, but while learning the differences on how to use pd.stack() pd.melt() and pd.wide_to_long() i came across to this answer, tested it myself and it just didn't want to get me the same result (I just got an empty df for res). In the end I figured out I need to add suffix=r".+" to come to the same result. Was it different back then or did it never worked but nobody noticed or cared? Or did I miss something ? It is not that I want to correct here, I just want to be sure I understand what is going on with these commands.
@Rabinzel, I'm not sure what has changed in the functionality. But what I can say is that I tested the code and it worked at the time I wrote this answer. It would be interesting, if it's true, to know why the suffix parameter is required.
thanks for the reply. Just wanted to verify if the problem is on my side or I am missunderstanding something. After googling a bit, I read several times that wide_to_long expects numerical suffix or it will fail but in the documentation all it says is that suffix="\d+" is the default.
8

I guess I found a simpler solution

temp1 = pd.melt(df1, id_vars=["location"], var_name='Date', value_name='Value')
temp2 = pd.melt(df1, id_vars=["name"], var_name='Date', value_name='Value')

Concat whole temp1 with temp2's column name

temp1['new_column'] = temp2['name']

You now have what you asked for.

1 Comment

How is that what they asked for? This melts more columns than they want and the result has two more rows than it should. Am I missing something?
5

Adding a link to a notebook which you can duplicate, demonstrating @DMS's answer using pandas.melt:

df.melt(id_vars=["location", "name"], 
    var_name="date", 
    value_name="value")

https://deepnote.com/@DataScience/Unpivot-a-DataFrame-from-wide-to-long-format-lN7WlqOdSlqroI_7DGAkoA

Comments

4

If you want to swap your rows with columns & columns with rows then try the transpose method of pandas:

df.T

Check the reference link: https://note.nkmk.me/en/python-pandas-t-transpose/

1 Comment

Transposing mean inverting row/column indexes, keeping the same labels. The OP wants to move some column labels to row index as levels and some column labels out of both indexes. .T is of no help here.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.