Splitting a column into multiple rows

Question

I have this data in a dataframe, The code column has several values and is of object datatype.

I want to split the rows in the following way

result

I tried to change the datatype by using

df['Code'] = df['Code'].astype(str)

and then tried to split the commas and reset the index on the basis of ID (unique) but I only get two column values. I need the entire dataset.

df = (pd.DataFrame(df.Code.str.split(',').tolist(), index=df.ID).stack()).reset_index([0, 'ID'])
df.columns = ['ID', 'Code']

Can someone help me out? I don't understand how to twist this code.

Attaching the setup code:

import pandas as pd

x = {'ID': ['1','2','3','4','5','6','7'],
        'A': ['a','b','c','a','b','b','c'],
        'B': ['z','x','y','x','y','z','x'],
        'C': ['s','d','w','','s','s','s'],
        'D': ['m','j','j','h','m','h','h'],
        'Code': ['AB,BC,A','AD,KL','AD,KL','AB,BC','A','A','B']
        }

df = pd.DataFrame(x, columns = ['ID', 'A','B','C','D','Code'])

df

ThePyGuy · Accepted Answer · 2021-06-17 03:59:06Z

2

You can first split Code column on comma , then explode it to get the desired output.

df['Code']=df['Code'].str.split(',')
df=df.explode('Code')

OUTPUT:

  ID  A  B  C  D Code
0  1  a  z  s  m   AB
0  1  a  z  s  m   BC
0  1  a  z  s  m    A
1  2  b  x  d  j   AD
1  2  b  x  d  j   KL
2  3  c  y  w  j   AD
2  3  c  y  w  j   KL
3  4  a  x     h   AB
3  4  a  x     h   BC
4  5  b  y  s  m    A
5  6  b  z  s  h    A
6  7  c  x  s  h    B

If needed, you can replace empty string by NaN

edited Jun 17, 2021 at 3:59

answered Jun 17, 2021 at 1:17

ThePyGuy

18.5k5 gold badges24 silver badges55 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Splitting a column into multiple rows

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related