4

I have the foll. dataframe:

AA          AB          AC                  AD              Col_1       Col_2     Col_3    
Northeast   Argentina   Northeast Argentina South America   Corrientes  Misiones        
Northern    Argentina   Northern  Argentina South America   Chaco       Formosa   Santiago Del 

I want to convert it to:

AA          AB          AC                  AD              Col
Northeast   Argentina   Northeast Argentina South America   Corrientes
Northeast   Argentina   Northeast Argentina South America   Misiones        
Northern    Argentina   Northern  Argentina South America   Chaco
Northern    Argentina   Northern  Argentina South America   Formosa
Northern    Argentina   Northern  Argentina South America   Santiago Del 

i.e. I want to preserve the first 4 columns but assign each of the remaining column values into a separate row. Is there a way to accomplish this without using a for loop?

0

2 Answers 2

3

You can try this:

df = df.melt(id_vars=['AA','AB','AC','AD']) 
df.dropna(inplace=True)
df.drop(columns='variable', inplace=True) 
df = df.sort_values('AA').reset_index(drop=True)
df.rename(columns={'value':'Col'}, inplace=True)

          AA         AB                   AC             AD           Col
0  Northeast  Argentina  Northeast Argentina  South America    Corrientes
1  Northeast  Argentina  Northeast Argentina  South America      Misiones
2   Northern  Argentina   Northern Argentina  South America         Chaco
3   Northern  Argentina   Northern Argentina  South America       Formosa
4   Northern  Argentina   Northern Argentina  South America  Santiago Del

Sign up to request clarification or add additional context in comments.

3 Comments

You can improve answer with not use inplace=True - check this
No, I am not pandas dev, but more times see this opinion and agree with it. I think it is more recommendation, check also this
Thanks, this helps me. I appreciate you taking the time to help me become a better pandas expert.
1

Try using:

df['Col'] = df[['Col_1', 'Col_2', 'Col_3']].values.tolist()
df = df.set_index(df.columns.drop('Col').tolist())['Col'].apply(pd.Series).stack().reset_index().rename(columns={0: 'Col'}).drop(['level_7', 'Col_1', 'Col_2', 'Col_3'], axis=1)
print(df)

Output:

          AA         AB                   AC             AD           Col
0  Northeast  Argentina  Northeast Argentina  South America    Corrientes
1  Northeast  Argentina  Northeast Argentina  South America      Misiones
2   Northern  Argentina   Northern Argentina  South America         Chaco
3   Northern  Argentina   Northern Argentina  South America       Formosa
4   Northern  Argentina   Northern Argentina  South America  Santiago Del

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.