Python pandas - lambda function that access other columns

Question

I have the following table/dataframe in pandas)

+-------------------------------+---------------+
|             Col_1             |     Col_2     |
+-------------------------------+---------------+
| ['Apple', 'Coffee', 'Banana'] | [Food]        |
| ['Apple']                     | [Drink]       |
| []                            | [Clothes]     |
| []                            | [Food]        |
| ['Apple', 'Orange']           | [Food]        |
| ['Apple', 'Orange']           | [Stuff, Food] |
+-------------------------------+---------------+

I want a way to copy the value in Col_2 (same row) if and only if len(x) == 0 in Col_1.

Thus the desired result is:

+-------------------------------+---------------+
|             Col_1             |     Col_2     |
+-------------------------------+---------------+
| ['Apple', 'Coffee', 'Banana'] | [Food]        |
| ['Apple']                     | [Drink]       |
| [Clothes]                     | [Clothes]     |
| [Food]                        | [Food]        |
| ['Apple', 'Orange']           | [Food]        |
| ['Apple', 'Orange']           | [Stuff, Food] |
+-------------------------------+---------------+

rafaelc · Accepted Answer · 2020-05-07 19:53:26Z

3

Looks like a simple loc assignment

m = df['Col_1'].str.len().eq(0)
df.loc[m, 'Col_1'] = df.loc[m, 'Col_2']

Should work even without the rhs loc, since pandas does index matching by default when assigning pd.Series.

df.loc[m, 'Col_1'] = df['Col_2']

edited May 7, 2020 at 19:53

answered May 7, 2020 at 19:47

rafaelc

59.4k15 gold badges64 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Ankur · Accepted Answer · 2020-05-07 20:04:09Z

1

The above answers are great. Here is one more way to do this.

import pandas as pd

data = [
    [['Apple', 'Coffee', 'Banana'], ['Food']],
    [['Apple'], ['Drink']],
    [[], ['Clothes']],
    [[], ['Food']],
    [['Apple', 'Orange'], ['Food']],
    [['Apple', 'Orange'], ['Stuff', 'Food']]
]

df = pd.DataFrame.from_records(data, columns=['col1', 'col2'])


def copy_if(row):
    if len(row['col1']) == 0:
        row['col1'] += row['col2']
    return row['col1']


df['col1'] = df[['col1', 'col2']].apply(copy_if, axis=1)
print(df)
#                       col1           col2
# 0  [Apple, Coffee, Banana]         [Food]
# 1                  [Apple]        [Drink]
# 2                [Clothes]      [Clothes]
# 3                   [Food]         [Food]
# 4          [Apple, Orange]         [Food]
# 5          [Apple, Orange]  [Stuff, Food]

answered May 7, 2020 at 20:04

Ankur

1,14910 silver badges21 bronze badges

1 Comment

Akbar Hussein Over a year ago

Thanks; I like your approach by using the apply function; it make my life easier! Definitely a simple answer!

Quang Hoang · Accepted Answer · 2020-05-07 19:53:18Z

1

rafaelc's answer is really good and should be of favour in general.

However, in this case, a usual list comprehension works and might be faster:

df['Col_1'] = [a if a else b for a,b in zip(df['Col_1'], df['Col_2'])]

Performance:

# rafaelc's answer
%%timeit -n 100
df = pd.DataFrame({
    'Col_1':[['Apple', 'Coffee', 'Banana'], ['Apple'], [], []],
    'Col_2':[['Food'],['Drink'],['Clothes'],['Food']]
})
m = df['Col_1'].str.len().eq(0)
df.loc[m, 'Col_1'] = df.loc[m, 'Col_2']    
# 1.4 ms ± 49.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# list comprehension
%%timeit -n 100
df = pd.DataFrame({
    'Col_1':[['Apple', 'Coffee', 'Banana'], ['Apple'], [], []],
    'Col_2':[['Food'],['Drink'],['Clothes'],['Food']]
})

df['Col_1'] = [a if a else b for a,b in zip(df['Col_1'], df['Col_2'])]
#485 µs ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

answered May 7, 2020 at 19:53

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

1 Comment

Akbar Hussein Over a year ago

Thanks for your comment and answer; actually performance in my case is not an issue. Thanks nonetheless for your input.

Collectives™ on Stack Overflow

Python pandas - lambda function that access other columns

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related