Create new column based on values in LIST in other column

Question

This question seems so simple yet I'm having so much trouble, and haven't seen it asked anywhere. I have a column that contains a different list in each row, and all I want to do is create a new column based on if a specific value is in that list. Data looks like this:

Col1
[5,6,23,7,20,21]    
[0,7,20,21]
[3,4,5,23,7,20,21]
[2,3,23,7,20,21]
[3,4,5,23,7,20,21]

Each number corresponds to a specific value, so 0 = 'apple', 2 = 'grape', etc...

While there are multiple values in each list, I'm really only looking for certain values, specifically 0, 2, 4, 6, 16, 17

So what I want to do is add a new column, with the value that corresponds to the number that's found within Col1.

This is what the solution should be:

Col1               Col2
[5,6,23,7,20,21]   Pear
[0,7,20,21]        Apple
[3,4,5,23,7,20,21] Watermelon
[2,3,23,7,20,21]   Grape
[16,20,21]         Pineapple

I have tried:

df['Col2'] = np.where(0 in df['Col1'], 'Apple',
                np.where(2 in df['Col1'], 'Grape', 
                   np.where(4 in df['Col1'], 'Watermelon', )

And so on... But this defaults all values to Apple

Col1               Col2
[5,6,23,7,20,21]   Apple
[0,7,20,21]        Apple
[3,4,5,23,7,20,21] Apple
[2,3,23,7,20,21]   Apple
[16,20,21]         Apple

I was able to successfully do it by putting the above in a for loop, but I am getting issues. Code:

df['Col2'] = ''
for i in range(0,df.shape[0]):
   df['Col2'][i] = np.where(0 in df['Col1'][i], 'Apple',
                   np.where(2 in df['Col1'][i], 'Grape', 
                      np.where(4 in df['Col1'][i], 'Watermelon', )

I get the result I am looking for, but I am being met with a warning:

<ipython-input-638-5dfd74b69688>:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

I assume the warning is because I have already created the blank column, but the only reason I did this is because I would get an error if I didn't create it. Furthermore, when I attempt to perform a simple df['Col2'].value_counts(), I get an error: TypeError: unhashable type: 'numpy.ndarray'. The result from value_counts() still shows up even though I get this error, which is odd.

I am not entirely sure how else to proceed, I've tried a bunch of other things to create this column but none have been able to work. Any advice appreciated!

Where do you get your fruit names based on the first item in col1? Is it another list or you've just defined them constantly? If they are defined by yourself and not too large, you can write with switch case, otherwise write that data table too. — Reza Akraminejad
– Reza Akraminejad, Commented Dec 14, 2021 at 18:37
It is just a small list of just 6 different fruit names. In total there are maybe ~40 but I only need to use 0,2,4,6,16,17 — coderX
– coderX, Commented Dec 14, 2021 at 18:40
@coderX. Is it possible to have two fruits in the same list? — Corralien
– Corralien, Commented Dec 14, 2021 at 18:42
It is possible, but extremely rare. Out of ~250 rows there was 1 where there were 2 fruits in the same list. — coderX
– coderX, Commented Dec 14, 2021 at 18:43

Corralien · Accepted Answer · 2021-12-14 18:42:57Z

2

Use explode:

d = {0: 'Apple', 2: 'Grape', 4: 'Watermelon', 6: 'Banana', 16: 'Pear', 17: 'Orange'}
df['Col2'] = df['Col1'].explode().map(d).dropna().groupby(level=0).apply(', '.join)
print(df)

# Output:
                       Col1        Col2
0     [5, 6, 23, 7, 20, 21]      Banana
1            [0, 7, 20, 21]       Apple
2  [3, 4, 5, 23, 7, 20, 21]  Watermelon
3     [2, 3, 23, 7, 20, 21]       Grape
4  [3, 4, 5, 23, 7, 20, 21]  Watermelon

edited Dec 14, 2021 at 18:42

answered Dec 14, 2021 at 18:35

Corralien

121k8 gold badges44 silver badges69 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

user17242583 Over a year ago

This is nice, because if there are multiple matches in one list, it will join them by , e.g. Apples, Oranges in one row.

Corralien Over a year ago

@coderX. If you have multiple fruits in the same list, they will be join like 'Apple, Banana, Grape, ...'

coderX Over a year ago

Of course, thanks @Corralien ! Saved me an hour of pulling my hair out trying different variations of in and .isin

Corralien Over a year ago

Just a tip. When you have a list, think explode to have scalar values instead a vector.

Rodalm · Accepted Answer · 2021-12-14 19:31:49Z

Loop through the list values and map them to the correct fruit, ignoring the unwanted ones. Set to NaN if there is no match. Use str.join to include the possibility of multiple matches.

To apply this logic row-wise use Series.apply

import numpy as np

mapping = {0: 'Apple', 2: 'Grape', 4: 'Watermelon'}

df['Col2'] = df['Col1'].apply(lambda lst: ', '.join(mapping[n] for n in lst if n in mapping) or np.nan)

Output:

>>> df

                       Col1        Col2
0     [5, 6, 23, 7, 20, 21]         NaN
1            [0, 7, 20, 21]       Apple
2  [3, 4, 5, 23, 7, 20, 21]  Watermelon
3     [2, 3, 23, 7, 20, 21]       Grape
4  [3, 4, 5, 23, 7, 20, 21]  Watermelon

Performance

Note that this is should be faster than Corralien's solution.

Setup:

df = pd.DataFrame({
    'Col1': [[5, 6, 23, 7, 20, 21],
             [0, 7, 20, 21],
             [3, 4, 5, 23, 7, 20, 21],
             [2, 3, 23, 7, 20, 21],
             [3, 4, 5, 23, 7, 20, 21]]
})

mapping = {0: 'Apple', 2: 'Grape', 4: 'Watermelon'}

def number_to_fruit(lst):
    return ', '.join(mapping[n] for n in lst if n in mapping) or np.nan

# Simulate a large DataFrame
n = 20000
df = pd.concat([df]*n, ignore_index=False)

>>> df.shape

(100000, 1)

Timmings:

# Using apply. (I've added dropna for a more fair comparison)
>>> %timeit -n 10 df['Col1'].apply(number_to_fruit).dropna()

116 ms ± 7.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Corralien's solution 
>>> %timeit -n 10 df['Col1'].explode().map(mapping).dropna().groupby(level=0).apply(', '.join)

710 ms ± 71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@coderX I updated my answer with a simple performance test. Have a look.

Collectives™ on Stack Overflow

Create new column based on values in LIST in other column

2 Answers 2

4 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related