2

Although not good coding practice, I've come to an special kind of problem, in which I need to go through a column of lists to erase particular values. I suppose one resolution could be managed with melting the 'neighbors' column, but I believe the code I've managed is close from the objective. I've prepared a reproducible example for better understanding:

import pandas as pd
import numpy as np


def removing_nan_neighboors(custom_df):
    nan_list = list(custom_df[custom_df['values'].notna()]['customer'])
    print(nan_list)
    custom_df['neighbors'] = [x for x in custom_df['neighbors'] if x not in nan_list]
    return custom_df


customer = [1, 2, 3, 4, 5, 6]
values = [np.nan, np.nan, 10, np.nan, 11, 12]
neighbors = [[6, 2], [1, 3], [2, 4], [3, 5], [4, 6], [5, 1]]
df = pd.DataFrame({'customer': customer, 'values': values, 'neighbors': neighbors})
df = removing_nan_neighboors(df)

print(df)

   customer values neighbors
0        1     NaN    [6, 2]
1        2     NaN    [1, 3]
2        3    10.0    [2, 4]
3        4     NaN    [3, 5]
4        5    11.0    [4, 6]
5        6    12.0    [5, 1]

The objective is to erase the customer numbers from the neighbors, if they have NaN values:

   customer values neighbors
0        1     NaN    [6]
1        2     NaN    [3]
2        3    10.0    []
3        4     NaN    [3, 5]
4        5    11.0    [6]
5        6    12.0    [5]

But I have failed to get that far, for my function doesn't work as intended yet. Help is appreciated.

3 Answers 3

2

In your case do explode then isin keep the notna

s = df['neighbors'].explode()
df['new'] = s[s.isin(df.loc[df['values'].notna(),'customer'])].groupby(level=0).agg(list)
df
Out[36]: 
   customer  values neighbors     new
0         1     NaN    [6, 2]     [6]
1         2     NaN    [1, 3]     [3]
2         3    10.0    [2, 4]     NaN
3         4     NaN    [3, 5]  [3, 5]
4         5    11.0    [4, 6]     [6]
5         6    12.0    [5, 1]     [5]
Sign up to request clarification or add additional context in comments.

1 Comment

Choosing as accepted answer for the big difference in time execution compared to others.
1

If I understood your objective correctly, you want to erase such numbers from every neighbors row that belong to that customer rows, where values is NaN. So basically you want to get the result from your last cell.

I attempted to do that in a list comprehension approach:

df['neighbors_new'] = [[n for n in neighbor 
                        if n not in df[df['values'].isna() == True]['customer'].to_list()] 
                       for neighbor in df.neighbors]

And got this:

   customer  values neighbors neighbors_new
0         1     NaN    [6, 2]           [6]
1         2     NaN    [1, 3]           [3]
2         3    10.0    [2, 4]            []
3         4     NaN    [3, 5]        [3, 5]
4         5    11.0    [4, 6]           [6]
5         6    12.0    [5, 1]           [5]

1 Comment

Thanks! I believe this answer is better in terms of robustness and use of list comprehension, so thumbs up for that, but the execution time was considerably bigger than the explode() answer.
0

Try:

df["cust_1"] = np.where(
    np.isnan(np.roll(df["values"], 1)),
    np.nan,
    np.roll(df["customer"], 1),
)

df["cust_2"] = np.where(
    np.isnan(np.roll(df["values"], -1)),
    np.nan,
    np.roll(df["customer"], -1),
)

df["neighbors"] = df[["cust_1", "cust_2"]].agg(
    lambda x: list(x[x.notna()].astype(int)), axis=1
)
df = df.drop(columns=["cust_1", "cust_2"])

print(df)

Prints:

   customer  values neighbors
0         1     NaN       [6]
1         2     NaN       [3]
2         3    10.0        []
3         4     NaN    [3, 5]
4         5    11.0       [6]
5         6    12.0       [5]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.