Populate a new pandas dataframe column with names of other columns based on their row value

Question

I want to add a new column in a dataframe with the names of other columns as values, based on a condition.

import pandas as pd
data = pd.DataFrame({
'customer': ['bob', 'jerry', 'alice', 'susan'],
'internet_bill': ['paid', 'past_due', 'due_soon', 'past_due'],
'electric_bill': ['past_due', 'due_soon', 'past_due', 'paid'],
'water_bill': ['paid', 'past_due', 'paid', 'paid']})

Here's the dataframe.

    customer    internet_bill   electric_bill   water_bill
0   bob         paid            past_due        paid
1   jerry       past_due        due_soon        past_due
2   alice       due_soon        past_due        paid
3   susan       past_due        paid            paid

I want to add a new column summarizing what is 'past_due'. Here's the desired result:

    customer    internet_bill   electric_bill   water_bill  past_due
0   bob         past_due        past_due        past_due    internet_bill, electric_bill, water_bill
1   jerry       past_due        due_soon        past_due    internet_bill, water_bill
2   alice       due_soon        past_due        paid        electric_bill
3   susan       past_due        paid            paid        internet_bill

I was able to do this in Excel with the following formula:

=TEXTJOIN(","&CHAR(10),TRUE,
IF(B2=Values!$A$1,$K$1,""),
IF(C2=Values!$A$1,$L$1,""),
IF(D2=Values!$A$1,$M$1,""))

Ultimately, my output will be an excel file for some nurses & hospital workers to follow up with patients (not bill collecting! Patient care stuff). I have thought about using an excel writer library to just create an .xlsx and insert formulas.

AND - I was able to do this to catch one column, but my gut tells me there's a much better way. Here's what I used to do that:

both['past_due'] = [
'internet_bill' if x == 'PAST_DUE' 
else 'None' for x in df['internet_bill']]

This would basically check the row in each targeted column if that row contained 'PAST_DUE', and if so, it would return the column name, move on to the next column, check for past due, add the column name.

I have had no success in finding anything close to this with searches, probably due to struggling to form a good question in the search bar. I haven't found any questions where someone is trying to pull other column names as a value based on a condition.

Thanks for any help!

Karthik V · Accepted Answer · 2019-07-03 22:43:57Z

3

  >>>data['past_due'] = data.apply(lambda x: tuple(x[x == 'past_due'].index), 
  axis=1)
  >>>data
  Out[75]: 
    customer             ...                                  past_due
  0      bob             ...                          (electric_bill,)
  1    jerry             ...               (internet_bill, water_bill)
  2    alice             ...                          (electric_bill,)
  3    susan             ...                          (internet_bill,)
  [4 rows x 5 columns]

answered Jul 3, 2019 at 22:43

Karthik V

1,8971 gold badge16 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

dakro Over a year ago

This worked perfectly! Thank you! I need to learn more about lambda. This is miles better than the other approach I was taking in case I didn't get an answer. Much appreciated.

user553965 Over a year ago

This approach works great, but is very slow compared to similar operations like data['sum'] = data[numerical_column_names].sum(axis=1). Is there a faster variation?

Collectives™ on Stack Overflow

Populate a new pandas dataframe column with names of other columns based on their row value

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related