4

I have a situation where I want to create a new column in a Pandas DataFrame and populate it according to conditions involving 2 other columns. In this example:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.array([['value1','value2'],['value',np.NaN],[np.NaN,np.NaN]]), columns=['col1','col2'])

I would like to create a new column, 'new col', which consists of 1) the value in 'col2' if it is not NaN else, 2) the value in 'col1' if it is not NaN else, 3) NaN

I am trying this function with .apply() but it is not returning the desired result

def singleval(row):
    if row['col2'] != np.NaN:
        val = row['col2']
    elif row['col1'] != np.NaN:
        val = row['col1']
    else:
        val = np.NaN
    return val

df['new col'] = df.apply(singleval,axis=1)

i want the values in 'new col' to be ['value2', 'value', 'nan']

3 Answers 3

2

Method 1 fillna

In this case, we can simply use fillna on col2 with values from col1:

df['new col'] = df['col2'].fillna(df['col1'])

     col1    col2 new col
0  value1  value2  value2
1   value     NaN   value
2     NaN     NaN     NaN

Method 2 np.select

If you have multiple conditions, use np.select which you pass a list of conditions and based on those conditions you pass it choices:

conditions = [
    df['col2'].notnull(),
    df['col1'].notnull(),
]

choices=[df['col2'], df['col1']]

df['new col'] = np.select(conditions, choices, default=np.NaN)

     col1    col2 new col
0  value1  value2  value2
1   value     NaN   value
2     NaN     NaN     NaN

Note

Your dataframe wasn't correct with the NaN, use this one instead to test:

df = pd.DataFrame({'col1':['value1', 'value', np.NaN],
                   'col2':['value2', np.NaN, np.NaN]})

Edit: why was the function not working?

np.NaN == np.NaN will return False
while np.NaN is np.NaN will return True.

See this question for the explanation of this.

So to fix your function you have to use is not:

def singleval(row):
    if row['col2'] is not np.NaN:
        val = row['col2']
    elif row['col1'] is not np.NaN:
        val = row['col1']
    else:
        val = np.NaN
    return val

df['new col'] = df.apply(singleval, axis=1)

     col1    col2 new col
0  value1  value2  value2
1   value     NaN   value
2     NaN     NaN     NaN
Sign up to request clarification or add additional context in comments.

4 Comments

struggling to see why your df is different from my df...nevermind: looks like it has to do with np.array()
Not sure either, would be a good question on SO as well :). @laszlopanaflex
thank you the 2 solutions. is it possible to explain why my original approach didn't work? im not able to see where the if-elif-else approach breaks down...
Added explanation about your approach @laszlopanaflex, good question btw!
0

Use df.ffill on axis=1

df['new_col'] = df.ffill(1).col2

Out[1318]:
     col1    col2 new_col
0  value1  value2  value2
1   value     NaN   value
2     NaN     NaN     NaN

Comments

0

Try this:

df['col3'] = df[['col1','col2']].stack().groupby(level=0).last()

output:

    col1    col2    col3
0   value1  value2  value2
1   value   nan     value
2   nan     nan     nan

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.