Pandas: Create a new column based on values from other columns (Row wise)

Question

I'm looking to create a custom function based off several columns (

`TOTAL_HH_INCOME','HH_SIZE'

'Eligible Household Size', 'income_min1', 'income_max1', 'hh_size2','income_min2', 'income_max2', 'hh_size3', 'income_min3', 'income_max3', 'hh_size4', 'income_min4', 'income_max4', 'hh_size5', 'income_min5', 'income_max5', 'hh_size6', 'income_min6', 'income_max6'`

I'm looking to compare HH Size vs each HH size# variable and TOTAL_HH_INCOME vs every income_min & income_max variable for each row in my dataframe.

I've made this function as an attempt

def eligibility (row):
    
    if df['HH_SIZE']== df['Eligible Household Size'] & df['TOTAL_HH_INCOME'] >= df['income_min1'] & df['TOTAL_HH_INCOME'] <=row['income_max1'] :
        return 'Eligible'
    
    if df['HH_SIZE']== df['hh_size2'] & df['TOTAL_HH_INCOME'] >= df['income_min2'] & df['TOTAL_HH_INCOME'] <=row['income_max2'] :
        return 'Eligible'
    
    if df['HH_SIZE']== df['hh_size3'] & df['TOTAL_HH_INCOME'] >= df['income_min3'] & df['TOTAL_HH_INCOME'] <=row['income_max3'] :
        return 'Eligible'

    if df['HH_SIZE']== df['hh_size4'] & df['TOTAL_HH_INCOME'] >= df['income_min4'] & df['TOTAL_HH_INCOME'] <=row['income_max4'] :
        return 'Eligible'

    if df['HH_SIZE']== df['hh_size5'] & df['TOTAL_HH_INCOME'] >= df['income_min5'] & df['TOTAL_HH_INCOME'] <=row['income_max5'] :
        return 'Eligible'

    if df['HH_SIZE']== df['hh_size6'] & df['TOTAL_HH_INCOME'] >= df['income_min6'] & df['TOTAL_HH_INCOME'] <=row['income_max6'] :
        return 'Eligible'
    
    return 'Ineligible'

As you can see if the row meets a condition I want the row to be labeled as "Eligible" if not it should be labeled 'Ineligible'

I applied this function to my df with

df['Eligibility']= df.apply(eligibility, axis=1)

However, i receive an error:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')

Why? Is my function off the mark?

EDIT:

====================== DATAFRAME ===========================

Can you provide a minimal reproducible sample dataframe? I believe the error is because each condition has to be enclosed in parentheses, but can't verify without a sample dataset — bh00t
– bh00t, Commented Jun 30, 2020 at 15:16

Mateo Lara · Accepted Answer · 2020-06-30 15:49:12Z

1

The problem seems to be the comparison operators in the if statements: because you are comparing columns of a data frame, there is not just one True values but there are as many True values as items in a column.

Try using a.all(), if you want all of the elements to be the same. Please refer to the example below:

import pandas as pd
dict1 = {'name1': ['tom', 'pedro'], 'name2': ['tom', 'pedro'],
         'name3': ['tome', 'maria'], 'name4': ['maria', 'marta']}
df1 = pd.DataFrame(dict1)

# This produce a ValueError as the one you have
# if df1['name1'] == df1['name2']:
#     pass
# To see why this produce an error try printing the following:
print('This is a DataFrame of bool values an can not be handle by an if statement: \n',
      df1['name1'] == df1['name2'])

# This check if all the elements in 'name1' are the same as in 'name2'
if (df1['name1'] == df1['name2']).all():
    print('\nEligible')

Output:

This is a DataFrame of bool values an can not be handle by an if statement: 
 0    True
 1    True
dtype: bool

Eligible

edited Jun 30, 2020 at 15:49

answered Jun 30, 2020 at 15:19

Mateo Lara

9372 gold badges13 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ynnad Over a year ago

How would this would in my instance? I get i want matching values for Household sizes . But after that I also want to see if that same household meets the income criteria by comparing its household income vs the income criteria range (income min and max).

MrNobody33 · Accepted Answer · 2020-06-30 16:26:47Z

0

You could try this, using df.to_records():

import re

#df.columns
s=['TOTAL_HH_INCOME','HH_SIZE','Eligible Household Size', 'income_min1', 'income_max1', 'hh_size2','income_min2', 'income_max2', 'hh_size3', 'income_min3', 'income_max3', 'hh_size4', 'income_min4', 'income_max4', 'hh_size5', 'income_min5', 'income_max5', 'hh_size6', 'income_min6', 'income_max6']


def func(row):
    totalincome=row[2]
    HHSIZE=row[3]
    indexhhsize=list(map(s.index,re.findall('(hh_size\d+)',''.join(s))))
    indexmax=list(map(s.index,re.findall('(income_max\d+)',''.join(s))))
    indexmin=list(map(s.index,re.findall('(income_min\d+)',''.join(s))))

    if(any(HHSIZE==row[i+1] for i in indexhhsize))\
    |(any(totalincome>=row[i+1] for i in indexmin))\
    |(any(totalincome<=row[i+1] for i in indexmax)):
        return 'Eligible'
    else:
        return 'Ineligible'
    
df['Eligibility']=[func(row) for row in df.to_records()]

edited Jun 30, 2020 at 16:26

answered Jun 30, 2020 at 15:52

MrNobody33

6,5039 silver badges20 bronze badges

6 Comments

ynnad Over a year ago

Ran it but: TypeError: '<=' not supported between instances of 'numpy.ndarray' and 'str'

MrNobody33 Over a year ago

Could you add a sample of your dataframe to make test please?

ynnad Over a year ago

Hi having trouble adding the df as text but included a image from excel of df.head(10) as an example

MrNobody33 Over a year ago

Oh I see, yo have multiple NaN values... maybe you can try before of all: df.fillna(0)

ynnad Over a year ago

Thanks for the help mate. Gave it a go but same error re: TypeError: '<=' not supported between instances of 'numpy.ndarray' and 'str'

|

Collectives™ on Stack Overflow

Pandas: Create a new column based on values from other columns (Row wise)

2 Answers 2

1 Comment

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related