2

I have a Pandas dataframe like this:

  A      B        C         D
0 month   month+1 quarter+1 season+1
1 season  month+5 quarter+3 season+2
2 day     month+1 quarter+2 season+1
3 year    month+3 quarter+4 season+2
4 quarter month+2 quarter+1 season+1
5 month   month+4 quarter+1 season+2

I would like to insert a new column called 'E' based on several IF conditions. If column 'A' equals 'month' then return values in 'B', if column 'A' equals 'quarter' then return values in 'C', if column 'A' equals 'season' then return values in 'D', and if not then return values in column 'A'

  A      B        C         D        E
0 month   month+1 quarter+1 season+1 month+1
1 season  month+5 quarter+3 season+2 season+2
2 day     month+1 quarter+2 season+1 day
3 year    month+3 quarter+4 season+2 year
4 quarter month+2 quarter+1 season+1 quarter+1
5 month   month+4 quarter+1 season+2 month+4

I am having trouble doing this. I have tried playing around with a function but it did not work. See my attempt:

def f(row):
    if row['A'] == 'month':
        val = ['B']
    elif row['A'] == 'quarter':
        val = ['C']
    elif row['A'] == 'season':
        val = ['D']
    else:
        val = ['A']
    return val

df['E'] = df.apply(f, axis=1)

EDITED: to change the last else to column 'A'

3 Answers 3

4

Frist, I recommend you see: When should I want to use apply() in my code.

I would use Series.replace

df['E'] = df['A'].replace(['month','quarter','season'],
                          [df['B'], df['C'], df['D']]) 

or numpy.select

cond = [df['A'].eq('month'), df['A'].eq('quarter'), df['A'].eq('season')]
values= [df['B'], df['C'], df['D']]
df['E']=np.select(cond,values,default=df['A'])

  A      B        C         D        E
0 month   month+1 quarter+1 season+1 month+1
1 season  month+5 quarter+3 season+2 season+2
2 day     month+1 quarter+2 season+1 day
3 year    month+3 quarter+4 season+2 year
4 quarter month+2 quarter+1 season+1 quarter+1
5 month   month+4 quarter+1 season+2 month+4
Sign up to request clarification or add additional context in comments.

Comments

3

Just use np.select

c1 = df['A'] == 'month'
c2 = df['A'] == 'quarter'
c3 = df['A'] == 'season'

df['E'] = np.select([c1, c2, c3], [df['B'], df['C'], df['D']], df['A'])

Out[271]:
         A        B          C         D          E
0    month  month+1  quarter+1  season+1    month+1
1   season  month+5  quarter+3  season+2   season+2
2      day  month+1  quarter+2  season+1        day
3     year  month+3  quarter+4  season+2       year
4  quarter  month+2  quarter+1  season+1  quarter+1
5    month  month+4  quarter+1  season+2    month+4

4 Comments

@ansev: He mentioned that his code doesn't work. His desired ouput is what he wants. My output matches his desired output. You solution is actually wrong because you base on his incomplete codes
@ansev: I quote OP words : I have tried playing around with a function but it did not work. See my attempt. HIs description: ..... if not then return values in column 'A'.....
True, he wrote else ['D'] but he said that by default it should be 'A', I didn't see it, +1
I would use df['A'] instead df.A, because OP could try df.T
1

You probably need to fix your code like this:

def f(row):
    if row['A'] == 'month':
        val = row['B']
    elif row['A'] == 'quarter':
        val = row['C']
    elif row['A'] == 'season':
        val = row['D']
    else:
        val = row['D']
    return val

df['E'] = df.apply(f, axis=1)

note: you forgot to include row

val = ['B'] # before
val = row['B'] # after

Edit: This is just to point out the problem in the code, for better approaches check out the other answers related to the usage of numpy.select

1 Comment

I just showed what was wrong in the code, wasn't trying to optimize

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.