0

I have a data unformatted like below in a pandas dataframe. The 3rd row is supposed to be split in 3 columns. But it is added in col1. I have read this data from an excel file.

 Col1                                            Col2             Col3
 0.500L PET X12                                  1.0L PET X12
 0.250L RGB X12                                  0.330L RGB X12
 0.330L CAN X24, 0.330L CAN X10, 0.330L CAN X12, 

What is the best method to get this data formatted as below?

 Col1             Col2              Col3
 0.500L PET X12   1.0L PET X12
 0.250L RGB X12   0.330L RGB X12
 0.330L CAN X24   0.330L CAN X10    0.330L CAN X12, 

Right now str.split combined with fillna for each col i used. As the dataframe has around 20 columns to combine. I am looking for an alternate method if any.

1 Answer 1

1

One idea is add , to each column, then join with replace NaN and last split with n=2 for splitting to 3 new columns and last remove possible , :

df['Col1'] = (df['Col1'].add(', ').fillna('') +
              df['Col2'].add(', ').fillna('') +
              df['Col3'].add(', ').fillna(''))

f = lambda x: x.str.strip(', ')
df[['Col1','Col2','Col3']] = df['Col1'].str.split(', ', n=2, expand=True).apply(f)
    
print (df)
             Col1            Col2            Col3
0  0.500L PET X12    1.0L PET X12                
1  0.250L RGB X12  0.330L RGB X12                
2  0.330L CAN X24  0.330L CAN X10  0.330L CAN X12
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.