0

I would like to conditionally replace values in a column that contains a series of arrays.

Example dataset below: (my real dataset contains many more columns and rows)

index   lists                                  condition
0       ['5 apples', '2 pears']                B
1       ['3 apples', '3 pears', '1 pumpkin']   A
2       ['4 blueberries']                      A
3       ['5 kiwis']                            C
4       ['1 pumpkin']                          B
...     ...                                    ...

For example, if the condition is A and the row contains '1 pumpkin', then I would like to replace the value with XXX. But if the condition is B and the row contains 1 pumpkin, then I would like to replace the value with YYY.

Desired output

index   lists                                  condition
0       ['5 apples', '2 pears']                B
1       ['3 apples', '3 pears', 'XXX']         A
2       ['4 blueberries']                      A
3       ['5 kiwis']                            C
4       ['YYY']                                B
...     ...                                    ...

The goal is, in fact, to replace all these values but 1 pumpkin is just one example. Importantly, I would like to maintain the array structure. Thanks!

2 Answers 2

2

Let us do explode then np.select

s = df.explode('lists')
cond = s['lists']=='1 pumpkin'
c1 = cond&s['condition'].eq('A')
c2 = cond&s['condition'].eq('B')
s['lists'] = np.select([c1,c2],['XXX','YYY'],default = s.lists.values )
df['lists'] = s.groupby(level=0)['lists'].agg(list)
Sign up to request clarification or add additional context in comments.

Comments

0

You can define a function with the logic you want to apply to the Dataframe and then call df.apply(function) to pass this logic over the df

def pumpkin(row):

    if '1 pumpkin' in row['lists']:
        data = row['lists'][:]
        if row['condition'] == 'A':
            data[data.index('1 pumpkin')] = 'XXX'
        elif row['condition'] == 'B':
            data[data.index('1 pumpkin')] = 'YYY'
        return data
    return row['lists']

df['lists'] = df.apply(pumpkin, axis=1)

Output

                      lists condition
0       [5 apples, 2 pears]         B
1  [3 apples, 3 pears, XXX]         A
2           [4 blueberries]         A
3                 [5 kiwis]         C
4                     [YYY]         B

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.