3

I'm trying to transform a DataFrame with a dynamic number of a_P columns looking like this

             a1_P       a2_P     weight  
0        33297.81   17407.93   14733.23  
1        58895.18   43013.57   86954.04  

into a new DataFrame, looking like this (sorted by P)

                P     weight  
0        17407.93   14733.23
1        33297.81   14733.23  
2        43013.57   86954.04
3        58895.18   86954.04    

So what I'm trying so far is

names = ["a1", "a2"]
p = pd.DataFrame(columns=["P", "weight"])
for i in range(0, len(names)):
  p += df[["{}_P".format(names[i]), "weight"]]

and to sort it afterwards but this does not work because columnnames are not identical I guess.

2 Answers 2

3

The pandas.melt function does something like what you want:

pd.melt(df, id_vars=['weight'], value_vars=['a1_P', 'a2_P'], value_name='P')
     weight variable         P
0  14733.23     a1_P  33297.81
1  86954.04     a1_P  58895.18
2  14733.23     a2_P  17407.93
3  86954.04     a2_P  43013.57

And of course, soring by P is easily done by appending a .sort('P') to the end of the melt statement.

pd.melt(df, id_vars=['weight'], value_vars=['a1_P', 'a2_P'], value_name='P').sort('P')
     weight variable         P
2  14733.23     a2_P  17407.93
0  14733.23     a1_P  33297.81
3  86954.04     a2_P  43013.57
1  86954.04     a1_P  58895.18

And if you want to be super dynamic, maybe generating the value_vars in this fancy way:

n_values = 2
value_vars = ["a{}_P".format(i+1) for i in range(0, n_values)]
pd.melt(df, id_vars=['weight'], value_vars=value_vars, value_name='P').sort('P')

To get the index to be [0, 1, 2, 3 ...], just use .reset_index(drop=True) either as a chained event, or like this:

df = pd.melt(df, id_vars=['weight'], value_vars=value_vars, value_name='P')
df.sort(inplace=True)
df.reset_index(drop=True, inplace=True)

I personally prefer inplace operations, because they are much much more memory efficient.

Sign up to request clarification or add additional context in comments.

5 Comments

Perfect! This is exactly what melt is designed for. In this particular example. you don't even need to specify value_vars. Nor does id_vars have to be a list. This is enough: pd.melt(df, id_vars='weight')
@LondonRob Good point! Though "Explicit is better than implicit". I also assumed the example in the question to be simplified. There may be a tonnes of other unwanted columns in there.
This very short and works well, thanks for that! And yes, there are a lot of other columns in there. For completion: there a line such like df = df.reset_index(drop=True) missing to get the final result I wanted.
@Peter I added some more code for the .reset_index()
Thanks for the advice about memory efficiency, didnt think about it.
1

A possible solution using Pandas concat (http://pandas.pydata.org/pandas-docs/stable/merging.html):

import pandas as pd                                                                           

df = pd.DataFrame.from_dict({'a1_P': [123.123, 342.123],
                             'a2_P': [232.12, 32.23],
                             'weight': [12312.23, 16232.3]})                        

cols = [x for x in df.columns if '_P' in x]                                         

new = pd.concat([df[col] for col in cols])                                          
oldidx = new.index                                                                  
weights = df.loc[new.index, 'weight'].tolist()                                      

new_df = pd.DataFrame.from_dict({'P': new,                                          
                                 'weight': weights})                                
new_df.sort(columns='P', inplace=True)                                           
new_df.reset_index(drop=True, inplace=True)   

print(new_df)

         P    weight                                                                          
0   32.230  16232.30
1  123.123  12312.23
2  232.120  12312.23
3  342.123  16232.30   

There is room for performance optimizations, but it should faster than a solution with explicit loops.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.