1

I have this data in a dataframe, The code column has several values and is of object datatype.

enter image description here

I want to split the rows in the following way

result enter image description here

I tried to change the datatype by using

df['Code'] = df['Code'].astype(str)

and then tried to split the commas and reset the index on the basis of ID (unique) but I only get two column values. I need the entire dataset.

df = (pd.DataFrame(df.Code.str.split(',').tolist(), index=df.ID).stack()).reset_index([0, 'ID'])
df.columns = ['ID', 'Code']

Can someone help me out? I don't understand how to twist this code.

Attaching the setup code:

import pandas as pd

x = {'ID': ['1','2','3','4','5','6','7'],
        'A': ['a','b','c','a','b','b','c'],
        'B': ['z','x','y','x','y','z','x'],
        'C': ['s','d','w','','s','s','s'],
        'D': ['m','j','j','h','m','h','h'],
        'Code': ['AB,BC,A','AD,KL','AD,KL','AB,BC','A','A','B']
        }

df = pd.DataFrame(x, columns = ['ID', 'A','B','C','D','Code'])

df

1 Answer 1

2

You can first split Code column on comma , then explode it to get the desired output.

df['Code']=df['Code'].str.split(',')
df=df.explode('Code')

OUTPUT:

  ID  A  B  C  D Code
0  1  a  z  s  m   AB
0  1  a  z  s  m   BC
0  1  a  z  s  m    A
1  2  b  x  d  j   AD
1  2  b  x  d  j   KL
2  3  c  y  w  j   AD
2  3  c  y  w  j   KL
3  4  a  x     h   AB
3  4  a  x     h   BC
4  5  b  y  s  m    A
5  6  b  z  s  h    A
6  7  c  x  s  h    B

If needed, you can replace empty string by NaN

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.