1

This question seems so simple yet I'm having so much trouble, and haven't seen it asked anywhere. I have a column that contains a different list in each row, and all I want to do is create a new column based on if a specific value is in that list. Data looks like this:

Col1
[5,6,23,7,20,21]    
[0,7,20,21]
[3,4,5,23,7,20,21]
[2,3,23,7,20,21]
[3,4,5,23,7,20,21]

Each number corresponds to a specific value, so 0 = 'apple', 2 = 'grape', etc...

While there are multiple values in each list, I'm really only looking for certain values, specifically 0, 2, 4, 6, 16, 17

So what I want to do is add a new column, with the value that corresponds to the number that's found within Col1.

This is what the solution should be:

Col1               Col2
[5,6,23,7,20,21]   Pear
[0,7,20,21]        Apple
[3,4,5,23,7,20,21] Watermelon
[2,3,23,7,20,21]   Grape
[16,20,21]         Pineapple

I have tried:

df['Col2'] = np.where(0 in df['Col1'], 'Apple',
                np.where(2 in df['Col1'], 'Grape', 
                   np.where(4 in df['Col1'], 'Watermelon', )

And so on... But this defaults all values to Apple

Col1               Col2
[5,6,23,7,20,21]   Apple
[0,7,20,21]        Apple
[3,4,5,23,7,20,21] Apple
[2,3,23,7,20,21]   Apple
[16,20,21]         Apple

I was able to successfully do it by putting the above in a for loop, but I am getting issues. Code:

df['Col2'] = ''
for i in range(0,df.shape[0]):
   df['Col2'][i] = np.where(0 in df['Col1'][i], 'Apple',
                   np.where(2 in df['Col1'][i], 'Grape', 
                      np.where(4 in df['Col1'][i], 'Watermelon', )

I get the result I am looking for, but I am being met with a warning:

<ipython-input-638-5dfd74b69688>:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

I assume the warning is because I have already created the blank column, but the only reason I did this is because I would get an error if I didn't create it. Furthermore, when I attempt to perform a simple df['Col2'].value_counts(), I get an error: TypeError: unhashable type: 'numpy.ndarray'. The result from value_counts() still shows up even though I get this error, which is odd.

I am not entirely sure how else to proceed, I've tried a bunch of other things to create this column but none have been able to work. Any advice appreciated!

4
  • Where do you get your fruit names based on the first item in col1? Is it another list or you've just defined them constantly? If they are defined by yourself and not too large, you can write with switch case, otherwise write that data table too. Commented Dec 14, 2021 at 18:37
  • It is just a small list of just 6 different fruit names. In total there are maybe ~40 but I only need to use 0,2,4,6,16,17 Commented Dec 14, 2021 at 18:40
  • @coderX. Is it possible to have two fruits in the same list? Commented Dec 14, 2021 at 18:42
  • It is possible, but extremely rare. Out of ~250 rows there was 1 where there were 2 fruits in the same list. Commented Dec 14, 2021 at 18:43

2 Answers 2

2

Use explode:

d = {0: 'Apple', 2: 'Grape', 4: 'Watermelon', 6: 'Banana', 16: 'Pear', 17: 'Orange'}
df['Col2'] = df['Col1'].explode().map(d).dropna().groupby(level=0).apply(', '.join)
print(df)

# Output:
                       Col1        Col2
0     [5, 6, 23, 7, 20, 21]      Banana
1            [0, 7, 20, 21]       Apple
2  [3, 4, 5, 23, 7, 20, 21]  Watermelon
3     [2, 3, 23, 7, 20, 21]       Grape
4  [3, 4, 5, 23, 7, 20, 21]  Watermelon
Sign up to request clarification or add additional context in comments.

4 Comments

This is nice, because if there are multiple matches in one list, it will join them by , e.g. Apples, Oranges in one row.
@coderX. If you have multiple fruits in the same list, they will be join like 'Apple, Banana, Grape, ...'
Of course, thanks @Corralien ! Saved me an hour of pulling my hair out trying different variations of in and .isin
Just a tip. When you have a list, think explode to have scalar values instead a vector.
1

Loop through the list values and map them to the correct fruit, ignoring the unwanted ones. Set to NaN if there is no match. Use str.join to include the possibility of multiple matches.

To apply this logic row-wise use Series.apply

import numpy as np

mapping = {0: 'Apple', 2: 'Grape', 4: 'Watermelon'}

df['Col2'] = df['Col1'].apply(lambda lst: ', '.join(mapping[n] for n in lst if n in mapping) or np.nan)

Output:

>>> df

                       Col1        Col2
0     [5, 6, 23, 7, 20, 21]         NaN
1            [0, 7, 20, 21]       Apple
2  [3, 4, 5, 23, 7, 20, 21]  Watermelon
3     [2, 3, 23, 7, 20, 21]       Grape
4  [3, 4, 5, 23, 7, 20, 21]  Watermelon

Performance

Note that this is should be faster than Corralien's solution.

Setup:

df = pd.DataFrame({
    'Col1': [[5, 6, 23, 7, 20, 21],
             [0, 7, 20, 21],
             [3, 4, 5, 23, 7, 20, 21],
             [2, 3, 23, 7, 20, 21],
             [3, 4, 5, 23, 7, 20, 21]]
})

mapping = {0: 'Apple', 2: 'Grape', 4: 'Watermelon'}

def number_to_fruit(lst):
    return ', '.join(mapping[n] for n in lst if n in mapping) or np.nan

# Simulate a large DataFrame
n = 20000
df = pd.concat([df]*n, ignore_index=False)

>>> df.shape

(100000, 1)

Timmings:

# Using apply. (I've added dropna for a more fair comparison)
>>> %timeit -n 10 df['Col1'].apply(number_to_fruit).dropna()

116 ms ± 7.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Corralien's solution 
>>> %timeit -n 10 df['Col1'].explode().map(mapping).dropna().groupby(level=0).apply(', '.join)

710 ms ± 71 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

1 Comment

@coderX I updated my answer with a simple performance test. Have a look.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.