Merge multiple DataFrame columns into one

Question

I'm trying to transform a DataFrame with a dynamic number of a_P columns looking like this

             a1_P       a2_P     weight  
0        33297.81   17407.93   14733.23  
1        58895.18   43013.57   86954.04

into a new DataFrame, looking like this (sorted by P)

                P     weight  
0        17407.93   14733.23
1        33297.81   14733.23  
2        43013.57   86954.04
3        58895.18   86954.04

So what I'm trying so far is

names = ["a1", "a2"]
p = pd.DataFrame(columns=["P", "weight"])
for i in range(0, len(names)):
  p += df[["{}_P".format(names[i]), "weight"]]

and to sort it afterwards but this does not work because columnnames are not identical I guess.

firelynx · Accepted Answer · 2015-08-10 14:42:59Z

3

The pandas.melt function does something like what you want:

pd.melt(df, id_vars=['weight'], value_vars=['a1_P', 'a2_P'], value_name='P')
     weight variable         P
0  14733.23     a1_P  33297.81
1  86954.04     a1_P  58895.18
2  14733.23     a2_P  17407.93
3  86954.04     a2_P  43013.57

And of course, soring by P is easily done by appending a .sort('P') to the end of the melt statement.

pd.melt(df, id_vars=['weight'], value_vars=['a1_P', 'a2_P'], value_name='P').sort('P')
     weight variable         P
2  14733.23     a2_P  17407.93
0  14733.23     a1_P  33297.81
3  86954.04     a2_P  43013.57
1  86954.04     a1_P  58895.18

And if you want to be super dynamic, maybe generating the value_vars in this fancy way:

n_values = 2
value_vars = ["a{}_P".format(i+1) for i in range(0, n_values)]
pd.melt(df, id_vars=['weight'], value_vars=value_vars, value_name='P').sort('P')

To get the index to be [0, 1, 2, 3 ...], just use .reset_index(drop=True) either as a chained event, or like this:

df = pd.melt(df, id_vars=['weight'], value_vars=value_vars, value_name='P')
df.sort(inplace=True)
df.reset_index(drop=True, inplace=True)

I personally prefer inplace operations, because they are much much more memory efficient.

edited Aug 10, 2015 at 14:42

answered Aug 10, 2015 at 12:33

firelynx

32.5k10 gold badges94 silver badges104 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

LondonRob Over a year ago

Perfect! This is exactly what melt is designed for. In this particular example. you don't even need to specify value_vars. Nor does id_vars have to be a list. This is enough: pd.melt(df, id_vars='weight')

firelynx Over a year ago

@LondonRob Good point! Though "Explicit is better than implicit". I also assumed the example in the question to be simplified. There may be a tonnes of other unwanted columns in there.

Peter Klauke Over a year ago

This very short and works well, thanks for that! And yes, there are a lot of other columns in there. For completion: there a line such like df = df.reset_index(drop=True) missing to get the final result I wanted.

firelynx Over a year ago

@Peter I added some more code for the .reset_index()

Peter Klauke Over a year ago

Thanks for the advice about memory efficiency, didnt think about it.

chris-sc · Accepted Answer · 2015-08-10 11:57:45Z

A possible solution using Pandas concat (http://pandas.pydata.org/pandas-docs/stable/merging.html):

import pandas as pd                                                                           

df = pd.DataFrame.from_dict({'a1_P': [123.123, 342.123],
                             'a2_P': [232.12, 32.23],
                             'weight': [12312.23, 16232.3]})                        

cols = [x for x in df.columns if '_P' in x]                                         

new = pd.concat([df[col] for col in cols])                                          
oldidx = new.index                                                                  
weights = df.loc[new.index, 'weight'].tolist()                                      

new_df = pd.DataFrame.from_dict({'P': new,                                          
                                 'weight': weights})                                
new_df.sort(columns='P', inplace=True)                                           
new_df.reset_index(drop=True, inplace=True)   

print(new_df)

         P    weight                                                                          
0   32.230  16232.30
1  123.123  12312.23
2  232.120  12312.23
3  342.123  16232.30

There is room for performance optimizations, but it should faster than a solution with explicit loops.

Collectives™ on Stack Overflow

Merge multiple DataFrame columns into one

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related