3

I have the following table/dataframe in pandas)

+-------------------------------+---------------+
|             Col_1             |     Col_2     |
+-------------------------------+---------------+
| ['Apple', 'Coffee', 'Banana'] | [Food]        |
| ['Apple']                     | [Drink]       |
| []                            | [Clothes]     |
| []                            | [Food]        |
| ['Apple', 'Orange']           | [Food]        |
| ['Apple', 'Orange']           | [Stuff, Food] |
+-------------------------------+---------------+

I want a way to copy the value in Col_2 (same row) if and only if len(x) == 0 in Col_1.

Thus the desired result is:

+-------------------------------+---------------+
|             Col_1             |     Col_2     |
+-------------------------------+---------------+
| ['Apple', 'Coffee', 'Banana'] | [Food]        |
| ['Apple']                     | [Drink]       |
| [Clothes]                     | [Clothes]     |
| [Food]                        | [Food]        |
| ['Apple', 'Orange']           | [Food]        |
| ['Apple', 'Orange']           | [Stuff, Food] |
+-------------------------------+---------------+

3 Answers 3

3

Looks like a simple loc assignment

m = df['Col_1'].str.len().eq(0)
df.loc[m, 'Col_1'] = df.loc[m, 'Col_2']

Should work even without the rhs loc, since pandas does index matching by default when assigning pd.Series.

df.loc[m, 'Col_1'] = df['Col_2']
Sign up to request clarification or add additional context in comments.

Comments

1

The above answers are great. Here is one more way to do this.

import pandas as pd

data = [
    [['Apple', 'Coffee', 'Banana'], ['Food']],
    [['Apple'], ['Drink']],
    [[], ['Clothes']],
    [[], ['Food']],
    [['Apple', 'Orange'], ['Food']],
    [['Apple', 'Orange'], ['Stuff', 'Food']]
]

df = pd.DataFrame.from_records(data, columns=['col1', 'col2'])


def copy_if(row):
    if len(row['col1']) == 0:
        row['col1'] += row['col2']
    return row['col1']


df['col1'] = df[['col1', 'col2']].apply(copy_if, axis=1)
print(df)
#                       col1           col2
# 0  [Apple, Coffee, Banana]         [Food]
# 1                  [Apple]        [Drink]
# 2                [Clothes]      [Clothes]
# 3                   [Food]         [Food]
# 4          [Apple, Orange]         [Food]
# 5          [Apple, Orange]  [Stuff, Food]

1 Comment

Thanks; I like your approach by using the apply function; it make my life easier! Definitely a simple answer!
1

rafaelc's answer is really good and should be of favour in general.

However, in this case, a usual list comprehension works and might be faster:

df['Col_1'] = [a if a else b for a,b in zip(df['Col_1'], df['Col_2'])]

Performance:

# rafaelc's answer
%%timeit -n 100
df = pd.DataFrame({
    'Col_1':[['Apple', 'Coffee', 'Banana'], ['Apple'], [], []],
    'Col_2':[['Food'],['Drink'],['Clothes'],['Food']]
})
m = df['Col_1'].str.len().eq(0)
df.loc[m, 'Col_1'] = df.loc[m, 'Col_2']    
# 1.4 ms ± 49.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# list comprehension
%%timeit -n 100
df = pd.DataFrame({
    'Col_1':[['Apple', 'Coffee', 'Banana'], ['Apple'], [], []],
    'Col_2':[['Food'],['Drink'],['Clothes'],['Food']]
})

df['Col_1'] = [a if a else b for a,b in zip(df['Col_1'], df['Col_2'])]
#485 µs ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

1 Comment

Thanks for your comment and answer; actually performance in my case is not an issue. Thanks nonetheless for your input.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.