How to drop values from lists inside columns from a Pandas DataFrame

Question

Although not good coding practice, I've come to an special kind of problem, in which I need to go through a column of lists to erase particular values. I suppose one resolution could be managed with melting the 'neighbors' column, but I believe the code I've managed is close from the objective. I've prepared a reproducible example for better understanding:

import pandas as pd
import numpy as np


def removing_nan_neighboors(custom_df):
    nan_list = list(custom_df[custom_df['values'].notna()]['customer'])
    print(nan_list)
    custom_df['neighbors'] = [x for x in custom_df['neighbors'] if x not in nan_list]
    return custom_df


customer = [1, 2, 3, 4, 5, 6]
values = [np.nan, np.nan, 10, np.nan, 11, 12]
neighbors = [[6, 2], [1, 3], [2, 4], [3, 5], [4, 6], [5, 1]]
df = pd.DataFrame({'customer': customer, 'values': values, 'neighbors': neighbors})
df = removing_nan_neighboors(df)

print(df)

   customer values neighbors
0        1     NaN    [6, 2]
1        2     NaN    [1, 3]
2        3    10.0    [2, 4]
3        4     NaN    [3, 5]
4        5    11.0    [4, 6]
5        6    12.0    [5, 1]

The objective is to erase the customer numbers from the neighbors, if they have NaN values:

   customer values neighbors
0        1     NaN    [6]
1        2     NaN    [3]
2        3    10.0    []
3        4     NaN    [3, 5]
4        5    11.0    [6]
5        6    12.0    [5]

But I have failed to get that far, for my function doesn't work as intended yet. Help is appreciated.

BENY · Accepted Answer · 2022-09-22 01:26:07Z

2

In your case do explode then isin keep the notna

s = df['neighbors'].explode()
df['new'] = s[s.isin(df.loc[df['values'].notna(),'customer'])].groupby(level=0).agg(list)
df
Out[36]: 
   customer  values neighbors     new
0         1     NaN    [6, 2]     [6]
1         2     NaN    [1, 3]     [3]
2         3    10.0    [2, 4]     NaN
3         4     NaN    [3, 5]  [3, 5]
4         5    11.0    [4, 6]     [6]
5         6    12.0    [5, 1]     [5]

answered Sep 22, 2022 at 1:26

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

enriicoo Over a year ago

Choosing as accepted answer for the big difference in time execution compared to others.

Nikita Shabankin · Accepted Answer · 2022-09-22 01:25:31Z

1

If I understood your objective correctly, you want to erase such numbers from every neighbors row that belong to that customer rows, where values is NaN. So basically you want to get the result from your last cell.

I attempted to do that in a list comprehension approach:

df['neighbors_new'] = [[n for n in neighbor 
                        if n not in df[df['values'].isna() == True]['customer'].to_list()] 
                       for neighbor in df.neighbors]

And got this:

   customer  values neighbors neighbors_new
0         1     NaN    [6, 2]           [6]
1         2     NaN    [1, 3]           [3]
2         3    10.0    [2, 4]            []
3         4     NaN    [3, 5]        [3, 5]
4         5    11.0    [4, 6]           [6]
5         6    12.0    [5, 1]           [5]

edited Sep 22, 2022 at 1:25

answered Sep 22, 2022 at 1:16

Nikita Shabankin

6349 silver badges18 bronze badges

1 Comment

enriicoo Over a year ago

Thanks! I believe this answer is better in terms of robustness and use of list comprehension, so thumbs up for that, but the execution time was considerably bigger than the explode() answer.

Andrej Kesely · Accepted Answer · 2022-09-22 01:10:35Z

0

Try:

df["cust_1"] = np.where(
    np.isnan(np.roll(df["values"], 1)),
    np.nan,
    np.roll(df["customer"], 1),
)

df["cust_2"] = np.where(
    np.isnan(np.roll(df["values"], -1)),
    np.nan,
    np.roll(df["customer"], -1),
)

df["neighbors"] = df[["cust_1", "cust_2"]].agg(
    lambda x: list(x[x.notna()].astype(int)), axis=1
)
df = df.drop(columns=["cust_1", "cust_2"])

print(df)

Prints:

   customer  values neighbors
0         1     NaN       [6]
1         2     NaN       [3]
2         3    10.0        []
3         4     NaN    [3, 5]
4         5    11.0       [6]
5         6    12.0       [5]

answered Sep 22, 2022 at 1:10

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Collectives™ on Stack Overflow

How to drop values from lists inside columns from a Pandas DataFrame

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related