creating columns based on previous column values with condition / Python -Pandas

Question

I have a dataframe like this

D_1  D_2   D_3    D_4
Boy                 
Boy  play       
Boy  play  car      
Boy  play  chess    
Boy  play  online

now I would like to have 3 more columns L_2, L_3 and L_4 where I can add up data data from the first three columns based on levels so that eventually I can have the result df as :

D_1  D_2   D_3  D_4   L_2       L_3           L_4
Boy                   boy|emp   boy|emp|emp   boy|emp|emp|emp
Boy  play             boy|play  boy|play|emp  boy|play|emp|emp
Boy  play  car        boy|play  boy|play|car  boy|play|car|emp
Girl                  Girl|emp  Girl|emp|emp  Girl|emp|emp|emp

my solution from SQL looks like this

select *
    , concat(D_1,"|",ifnull(D_2, "emp")) as L_2  
    , concat(D_1,"|",ifnull(D_2, "emp"), "|", ifnull(D_3, "emp")) as L_3  
    , concat(D_1,"|",ifnull(D_2, "emp"), "|", ifnull(D_3, "emp"), "|", ifnull(D_4, "emp")) as L_4  
from abc

can anyone guide me how can i convert this in python scripting? Thanks in advance!

because I have a python script which is cleaning the file and pushing it to bigquery , I want to avoid using SQL and get the updated data directly from python script. — sdave
– sdave, Commented Jun 11, 2021 at 13:53

99_m4n · Accepted Answer · 2021-06-11 15:40:10Z

2

you can generalize the code for any number of columns like this:

for i in range(1, len(df.columns)):
    df['L_' + str(i+1)] = df[df.columns[:i+1]].fillna('emp').agg('|'.join, axis=1)

Output:

>>> print(df)
   D_1   D_2     D_3 D_4       L_2              L_3                  L_4
0  Boy                     Boy|emp      Boy|emp|emp      Boy|emp|emp|emp
1  Boy  play              Boy|play     Boy|play|emp     Boy|play|emp|emp
2  Boy  play     car      Boy|play     Boy|play|car     Boy|play|car|emp
3  Boy  play   chess      Boy|play   Boy|play|chess   Boy|play|chess|emp
4  Boy  play  online      Boy|play  Boy|play|online  Boy|play|online|emp

The whole code:

import pandas as pd
from io import StringIO

txt = '''D_1  D_2   D_3    D_4
Boy                 
Boy  play       
Boy  play  car      
Boy  play  chess    
Boy  play  online
'''

df = pd.read_csv(StringIO(txt), header=0, skipinitialspace=True, sep=r'\s+')

for i in range(1, len(df.columns)):
    df['L_' + str(i+1)] = df[df.columns[:i+1]].fillna('emp').agg('|'.join, axis=1)

df = df.fillna('')

print(df)

edited Jun 11, 2021 at 15:40

answered Jun 11, 2021 at 13:21

99_m4n

1,2655 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sdave Over a year ago

Thanks, quick fix but if you see DF now we have extra 'emp' as well

99_m4n Over a year ago

@sdave I've edited so that you don't get the 'emp' in the original DataFrame

Hamza usman ghani · Accepted Answer · 2021-06-11 13:26:13Z

2

Replace "" will "emp" using Series.replace() then merge columns values using join() over iteration on columns

df = pd.DataFrame({"D_1":["Boy","Boy","Boy","Girl"],"D_2":["","play","play",""],"D_3":["","","car",""],"D_4":[""]*4})
temp = df.replace([''],'emp')
for c in range(1,len(temp.columns)):
    df[f'L_{c+1}'] = temp[temp.columns[:c+1]].astype(str).apply(lambda x: '|'.join(x), axis=1)

print(df)

    D_1  D_2    D_3   D_4     L_2           L_3              L_4
0   Boy                     Boy|emp     Boy|emp|emp     Boy|emp|emp|emp
1   Boy  play               Boy|play    Boy|play|emp    Boy|play|emp|emp
2   Boy  play   car         Boy|play    Boy|play|car    Boy|play|car|emp
3   Girl                    Girl|emp    Girl|emp|emp    Girl|emp|emp|emp

edited Jun 11, 2021 at 13:26

answered Jun 11, 2021 at 13:13

Hamza usman ghani

2,2437 silver badges20 bronze badges

1 Comment

sdave Over a year ago

Thanks, I used your previous solution, the one you had before editing, as I wanted to define which columns to use. In real DF i have many more columns which I don't want to include here so you previous solution worked well for me :)

Collectives™ on Stack Overflow

creating columns based on previous column values with condition / Python -Pandas

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related