Find any negative values in a given set of dataframes and replace whole column with np.nan

Question

I have a large dataframe and I want to search 144 of the columns to check if there are any negative values in them. If there is even one negative value in a column, I want to replace the whole column with np.nan. I then want to use the new version of the dataframe for later analysis.

I've tried a varied of methods but can't seem to find one that works. I think this is almost there but I can't seem to find a solution to what I'm trying to do.

clean_data_df.loc[clean_data_df.cols < 0, cols] = np.nan   #cols is a list of the column names I want to check  
null_columns=clean_data_df.columns[clean_data_df.isnull().any(axis=1)]
clean_data_df[null_columns] = np.nan

When I run the above code I get the following error: AttributeError: 'DataFrame' object has no attribute 'cols'

Thanks in advance!

Because if you are using lists, you should use clean_data_df[cols] — Celius Stingher
– Celius Stingher, Commented Feb 20, 2020 at 14:58
I've tried that too and I get a different error: "None of [Index([ ('p', 'r', 'e', 's', 's', 'u', 'r', 'e', '', '1'),\n ('p', 'r', 'e', 's', 's', 'u', 'r', 'e', '', '2'),\n [this carries on for all the column names],\n dtype='object', length=144)] are in the [index]" — ShrutiTurner
– ShrutiTurner, Commented Feb 20, 2020 at 15:05

Celius Stingher · Accepted Answer · 2020-02-20 15:14:17Z

2

You could use a loop to iterate over the columns:

for i in col:
   if df[i].isna().any():
       df[i] = np.nan

Minumum reproducible example:

df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan],'c':[1,2,3]})
for i in df:
    if df[i].isna().any():
        df[i] = np.nan
print(df)

Output:

    a   b  c
0 NaN NaN  1
1 NaN NaN  2
2 NaN NaN  3

answered Feb 20, 2020 at 15:14

Celius Stingher

18.4k6 gold badges26 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

anky Over a year ago

for such usecases , you can also take a look at loc i.e : df.loc[:,df.isna().any()] = np.nan

Celius Stingher Over a year ago

How does the df.isna().any() work inside the loc? I seem to be getting an AttributeError:'float' object has no attribute 'loc'

anky Over a year ago

Attribute error stating what? df.loc[:,df.isna().any()] returns the first 2 columns since they are true, loc is smart in understanding a boolean list when passes after the comma i.e for columns , test with df.loc[:,[True,True,False]]

jezrael · Accepted Answer · 2020-02-20 15:37:51Z

1

Idea is filter only filtered rows by cols by DataFrame.lt and DataFrame.any and then add all another columns filled by False in Series.reindex, last set values by DataFrame.loc, here first : means all rows:

df = pd.DataFrame({'a':list('abc'), 'b':[-2,-1,-3],'c':[1,2,3]})

cols = ['b','c']
df.loc[:, df[cols].lt(0).any().reindex(df.columns, fill_value=False)] = np.nan
print(df)
   a   b  c
0  a NaN  1
1  b NaN  2
2  c NaN  3

Detail:

print(df[cols].lt(0).any())
b     True
c    False
dtype: bool

print (df[cols].lt(0).any().reindex(df.columns, fill_value=False))
a    False
b     True
c    False
dtype: bool

edited Feb 20, 2020 at 15:37

answered Feb 20, 2020 at 15:32

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Collectives™ on Stack Overflow

Find any negative values in a given set of dataframes and replace whole column with np.nan

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related